[PATCH v1] Test: Move pr116278 run test to c-torture [NFC]
From: Pan Li Move the run test of pr116278 to c-torture and leave the risc-v the asm check under risc-v part. PR target/116278 gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run test. * gcc.target/riscv/pr116278-run-2.c: Ditto. * gcc.c-torture/execute/pr116278-run-1.c: New test. * gcc.c-torture/execute/pr116278-run-2.c: New test. Signed-off-by: Pan Li --- .../gcc.c-torture/execute/pr116278-run-1.c | 18 ++ .../gcc.c-torture/execute/pr116278-run-2.c | 18 ++ .../gcc.target/riscv/pr116278-run-1.c | 2 +- .../gcc.target/riscv/pr116278-run-2.c | 2 +- 4 files changed, 38 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr116278-run-1.c create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr116278-run-2.c diff --git a/gcc/testsuite/gcc.c-torture/execute/pr116278-run-1.c b/gcc/testsuite/gcc.c-torture/execute/pr116278-run-1.c new file mode 100644 index 000..fa5340c9d58 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr116278-run-1.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include + +int8_t b[1]; +int8_t *d = b; +int32_t c; + +int main() { + b[0] = -40; + uint16_t t = (uint16_t)d[0]; + + c = (t < 0xFFF6 ? t : 0xFFF6) + 9; + + if (c != 65505) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.c-torture/execute/pr116278-run-2.c b/gcc/testsuite/gcc.c-torture/execute/pr116278-run-2.c new file mode 100644 index 000..65439d614a1 --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/execute/pr116278-run-2.c @@ -0,0 +1,18 @@ +/* { dg-do run } */ +/* { dg-options "-O2" } */ + +#include + +int16_t b[1]; +int16_t *d = b; +int64_t c; + +int main() { + b[0] = -40; + uint32_t t = (uint32_t)d[0]; + + c = (t < 0xFFF6u ? t : 0xFFF6u) + 9; + + if (c != 4294967265) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c index d3812bdcdfb..c758fca7975 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c index 669cd4f003f..a4da8a323f0 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include -- 2.43.0
[PATCH] libstdc++: Remove note from the GCC 4.0.1 days
When I updated one of the links yesterday I noticed we have this obsolete reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years ago. I suggest we remove these. Okay? Gerald libstdc++-v3: * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1 days. * doc/html/manual/setup.html: Regenerate. diff --git a/libstdc++-v3/doc/html/manual/setup.html b/libstdc++-v3/doc/html/manual/setup.html index 78d2a00c50a..d8c5ff65cff 100644 --- a/libstdc++-v3/doc/html/manual/setup.html +++ b/libstdc++-v3/doc/html/manual/setup.html @@ -29,10 +29,7 @@ the tools you will need if you wish to modify the source. Additional data is given here only where it applies to libstdc++. - As of GCC 4.0.1 the minimum version of binutils required to build - libstdc++ is 2.15.90.0.1.1. - Older releases of libstdc++ do not require such a recent version, - but to take full advantage of useful space-saving features and + To take full advantage of useful space-saving features and bug-fixes you should use a recent binutils whenever possible. The configure process will automatically detect and use these features if the underlying support is present. diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml b/libstdc++-v3/doc/xml/manual/prerequisites.xml index a3c6e732a77..0efe63bcd46 100644 --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml @@ -25,10 +25,7 @@ Additional data is given here only where it applies to libstdc++. - As of GCC 4.0.1 the minimum version of binutils required to build - libstdc++ is 2.15.90.0.1.1. - Older releases of libstdc++ do not require such a recent version, - but to take full advantage of useful space-saving features and + To take full advantage of useful space-saving features and bug-fixes you should use a recent binutils whenever possible. The configure process will automatically detect and use these features if the underlying support is present.
Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days
On Sun, 18 Aug 2024, 09:53 Gerald Pfeifer, wrote: > When I updated one of the links yesterday I noticed we have this obsolete > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years > ago. > > I suggest we remove these. > > Okay? > OK > Gerald > > > libstdc++-v3: > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1 > days. > * doc/html/manual/setup.html: Regenerate. > > diff --git a/libstdc++-v3/doc/html/manual/setup.html > b/libstdc++-v3/doc/html/manual/setup.html > index 78d2a00c50a..d8c5ff65cff 100644 > --- a/libstdc++-v3/doc/html/manual/setup.html > +++ b/libstdc++-v3/doc/html/manual/setup.html > @@ -29,10 +29,7 @@ > the tools you will need if you wish to modify the source. > > Additional data is given here only where it applies to libstdc++. > - As of GCC 4.0.1 the minimum version of binutils required to build > - libstdc++ is 2.15.90.0.1.1. > - Older releases of libstdc++ do not require such a recent version, > - but to take full advantage of useful space-saving features and > + To take full advantage of useful space-saving features and >bug-fixes you should use a recent binutils whenever possible. >The configure process will automatically detect and use these >features if the underlying support is present. > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml > b/libstdc++-v3/doc/xml/manual/prerequisites.xml > index a3c6e732a77..0efe63bcd46 100644 > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml > @@ -25,10 +25,7 @@ > Additional data is given here only where it applies to libstdc++. > > > - As of GCC 4.0.1 the minimum version of binutils required to build > - libstdc++ is 2.15.90.0.1.1. > - Older releases of libstdc++ do not require such a recent version, > - but to take full advantage of useful space-saving features and > + To take full advantage of useful space-saving features and >bug-fixes you should use a recent binutils whenever possible. >The configure process will automatically detect and use these >features if the underlying support is present. >
[PATCH] c++/modules: Slightly clean up error for referencing TU-local entity
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk? Or should we even just remove the warning entirely? I'm not sure it really adds all that much, since it's usual AFAICT for errors to prevent the intended outputs from being generated. -- >8 -- It was pointed out to me that the current error referencing an internal linkage entity reads almost like an ICE message, with the message finishing with the unhelpful: m.cpp:1:8: error: failed to write compiled module: Bad file data 1 | export module M; |^~ It would probably be clearer to just emit the same warning that we do in other cases where we don't write modules due to errors. gcc/cp/ChangeLog: * module.cc (module_state::write_begin): Return a boolean to indicate errors rather than just doing to->set_error(). (finish_module_processing): Check for failed write_begin and disable module writing in that case. gcc/testsuite/ChangeLog: * g++.dg/modules/block-decl-2.C: Adjust error message. * g++.dg/modules/internal-1.C: Likewise. Signed-off-by: Nathaniel Shead --- gcc/cp/module.cc| 30 - gcc/testsuite/g++.dg/modules/block-decl-2.C | 2 +- gcc/testsuite/g++.dg/modules/internal-1.C | 2 +- 3 files changed, 19 insertions(+), 15 deletions(-) diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc index f4d137b13a1..9f23feece09 100644 --- a/gcc/cp/module.cc +++ b/gcc/cp/module.cc @@ -3681,7 +3681,7 @@ class GTY((chain_next ("%h.parent"), for_user)) module_state { public: /* Read and write module. */ - void write_begin (elf_out *to, cpp_reader *, + bool write_begin (elf_out *to, cpp_reader *, module_state_config &, unsigned &crc); void write_end (elf_out *to, cpp_reader *, module_state_config &, unsigned &crc); @@ -18317,7 +18317,7 @@ ool_cmp (const void *a_, const void *b_) MOD_SNAME_PFX.cfg : config data */ -void +bool module_state::write_begin (elf_out *to, cpp_reader *reader, module_state_config &config, unsigned &crc) { @@ -18395,10 +18395,7 @@ module_state::write_begin (elf_out *to, cpp_reader *reader, table.find_dependencies (this); if (!table.finalize_dependencies ()) -{ - to->set_error (); - return; -} +return false; #if CHECKING_P /* We're done verifying at-most once reading, reset to verify @@ -18595,6 +18592,8 @@ module_state::write_begin (elf_out *to, cpp_reader *reader, // so-controlled. if (false) write_env (to); + + return true; } // Finish module writing after we've emitted all dynamic initializers. @@ -20847,22 +20846,27 @@ finish_module_processing (cpp_reader *reader) cookie = new module_processing_cookie (cmi_name, tmp_name, fd, e); + bool report_error = false; if (errorcount /* Don't write the module if it contains an erroneous template. */ || (erroneous_templates && !erroneous_templates->is_empty ())) - warning_at (state->loc, 0, "not writing module %qs due to errors", - state->get_flatname ()); + report_error = true; else if (cookie->out.begin ()) { - cookie->began = true; - auto loc = input_location; /* So crashes finger-point the module decl. */ - input_location = state->loc; - state->write_begin (&cookie->out, reader, cookie->config, cookie->crc); - input_location = loc; + iloc_sentinel ils = state->loc; + if (state->write_begin (&cookie->out, reader, cookie->config, + cookie->crc)) + cookie->began = true; + else + report_error = true; } + if (report_error) + warning_at (state->loc, 0, "not writing module %qs due to errors", + state->get_flatname ()); + dump.pop (n); timevar_stop (TV_MODULE_EXPORT); diff --git a/gcc/testsuite/g++.dg/modules/block-decl-2.C b/gcc/testsuite/g++.dg/modules/block-decl-2.C index 974e26f9b7a..90f18b30945 100644 --- a/gcc/testsuite/g++.dg/modules/block-decl-2.C +++ b/gcc/testsuite/g++.dg/modules/block-decl-2.C @@ -18,4 +18,4 @@ export extern "C++" auto foo() { return X{}; } -// { dg-prune-output "failed to write compiled module" } +// { dg-prune-output "not writing module" } diff --git a/gcc/testsuite/g++.dg/modules/internal-1.C b/gcc/testsuite/g++.dg/modules/internal-1.C index 45d3bf06f28..399dd68b92e 100644 --- a/gcc/testsuite/g++.dg/modules/internal-1.C +++ b/gcc/testsuite/g++.dg/modules/internal-1.C @@ -1,6 +1,6 @@ // { dg-additional-options "-fmodules-ts" } -export module frob; // { dg-error "failed to write" } +export module frob; // { dg-warning "not writing module" } // { dg-module-cmi !frob } namespace { -- 2.43.2
[pushed] wwwdocs: projects: Remove parser-related simple project(s)
Christopher pointed out these did not appear applicable any longer. >From what I found I agree, so removed this from the beginner projects list. Pushed. Gerald --- htdocs/projects/beginner.html | 37 --- 1 file changed, 37 deletions(-) diff --git a/htdocs/projects/beginner.html b/htdocs/projects/beginner.html index 83efbd86..a6ea9525 100644 --- a/htdocs/projects/beginner.html +++ b/htdocs/projects/beginner.html @@ -164,43 +164,6 @@ following shell command, run from the gcc subdirectory: -Remove as much code from parser actions as possible. - -This goes more or less with the above. Good existing code: - - -expr_no_commas: -expr_no_commas '+' expr_no_commas -{ $$ = parser_build_binary_op ($2, $1, $3); } - - -Bad existing code: - - -cast_expr: -'(' typename ')' cast_expr %prec UNARY -{ tree type; - int SAVED_warn_strict_prototypes = warn_strict_prototypes; - /* This avoids warnings about unprototyped casts on - integers. E.g. "#define SIG_DFL (void(*)())0". */ - if (TREE_CODE ($4) == INTEGER_CST) -warn_strict_prototypes = 0; - type = groktypename ($2); - warn_strict_prototypes = SAVED_warn_strict_prototypes; - $$ = build_c_cast (type, $4); } - - -All the logic here should be moved into a separate function in -c-typeck.c, named something like parser_build_c_cast. The point of -doing this is, the less code in Yacc input files, the easier it is to -rearrange the grammar and/or replace it entirely. Also it makes it -less likely that someone will muck with action code and then forget to -rebuild the generated parser and check it in. - -We also want to minimize the number of helper functions embedded in -the grammar file. - - Break up enormous functions. This is in the same vein as the above, but significantly harder, -- 2.46.0
[patch,avr,applied] Fix PR116407
This fixes "relocation truncated to fit" errors from the linker due to bogus (too small) jump offsets. Johann --AVR: target/116407 - Fix linker error "relocation truncated to fit". Some text peepholes output extra instructions prior to a branch instruction and that increase the jump offset of backward branches. PR target/116407 gcc/ * config/avr/avr-protos.h (avr_jump_mode): Add an int argument. * config/avr/avr.cc (avr_jump_mode): Add an int argument to increase the computed jump offset of backwards branches. * config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1): Increase the jump offset used by avr_jump_mode() as needed. gcc/testsuite/ * gcc.target/avr/torture/pr116407-2.c: New test. * gcc.target/avr/torture/pr116407-4.c: New test. diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h index 7b666f17718..34298b976a7 100644 --- a/gcc/config/avr/avr-protos.h +++ b/gcc/config/avr/avr-protos.h @@ -115,7 +115,7 @@ extern const char* avr_out_reload_inpsi (rtx*, rtx, int*); extern const char* avr_out_lpm (rtx_insn *, rtx*, int*); extern void avr_notice_update_cc (rtx body, rtx_insn *insn); extern int reg_unused_after (rtx_insn *insn, rtx reg); -extern int avr_jump_mode (rtx x, rtx_insn *insn); +extern int avr_jump_mode (rtx x, rtx_insn *insn, int = 0); extern int test_hard_reg_class (enum reg_class rclass, rtx x); extern int jump_over_one_insn_p (rtx_insn *insn, rtx dest); diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc index 8c19bcb34a6..c520b98a178 100644 --- a/gcc/config/avr/avr.cc +++ b/gcc/config/avr/avr.cc @@ -4133,19 +4133,22 @@ avr_use_by_pieces_infrastructure_p (unsigned HOST_WIDE_INT size, /* Choose mode for jump insn: 1 - relative jump in range -63 <= x <= 62 ; 2 - relative jump in range -2046 <= x <= 2045 ; - 3 - absolute jump (only for ATmega[16]03). */ + 3 - absolute jump (only when we have JMP / CALL). + + When jumping backwards, assume the jump offset is EXTRA words + bigger than inferred from insn addresses. */ int -avr_jump_mode (rtx x, rtx_insn *insn) +avr_jump_mode (rtx x, rtx_insn *insn, int extra) { int dest_addr = INSN_ADDRESSES (INSN_UID (GET_CODE (x) == LABEL_REF ? XEXP (x, 0) : x)); int cur_addr = INSN_ADDRESSES (INSN_UID (insn)); int jump_distance = cur_addr - dest_addr; - if (IN_RANGE (jump_distance, -63, 62)) + if (IN_RANGE (jump_distance, -63, 62 - extra)) return 1; - else if (IN_RANGE (jump_distance, -2046, 2045)) + else if (IN_RANGE (jump_distance, -2046, 2045 - extra)) return 2; else if (AVR_HAVE_JMP_CALL) return 3; diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index 28841e40db1..8c4819a901f 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -7605,7 +7605,7 @@ (define_peephole ; "*dec-and-branchsi!=-1.d.clobber" "sbc %C0,__zero_reg__" CR_TAB "sbc %D0,__zero_reg__", operands); -int jump_mode = avr_jump_mode (operands[2], insn); +int jump_mode = avr_jump_mode (operands[2], insn, 3 - avr_adiw_reg_p (operands[0])); const char *op = ((EQ == ) ^ (jump_mode == 1)) ? "brcc" : "brcs"; operands[1] = gen_rtx_CONST_STRING (VOIDmode, op); @@ -7642,7 +7642,7 @@ (define_peephole ; "*dec-and-branchhi!=-1" output_asm_insn ("subi %A0,1" CR_TAB "sbc %B0,__zero_reg__", operands); -int jump_mode = avr_jump_mode (operands[2], insn); +int jump_mode = avr_jump_mode (operands[2], insn, 1 - avr_adiw_reg_p (operands[0])); const char *op = ((EQ == ) ^ (jump_mode == 1)) ? "brcc" : "brcs"; operands[1] = gen_rtx_CONST_STRING (VOIDmode, op); @@ -7681,7 +7681,7 @@ (define_peephole ; "*dec-and-branchhi!=-1.d.clobber" output_asm_insn ("subi %A0,1" CR_TAB "sbc %B0,__zero_reg__", operands); -int jump_mode = avr_jump_mode (operands[2], insn); +int jump_mode = avr_jump_mode (operands[2], insn, 1 - avr_adiw_reg_p (operands[0])); const char *op = ((EQ == ) ^ (jump_mode == 1)) ? "brcc" : "brcs"; operands[1] = gen_rtx_CONST_STRING (VOIDmode, op); @@ -7718,7 +7718,7 @@ (define_peephole ; "*dec-and-branchhi!=-1.l.clobber" "sub %A0,%3" CR_TAB "sbc %B0,__zero_reg__", operands); -int jump_mode = avr_jump_mode (operands[2], insn); +int jump_mode = avr_jump_mode (operands[2], insn, 1 - avr_adiw_reg_p (operands[0])); const char *op = ((EQ == ) ^ (jump_mode == 1)) ? "brcc" : "brcs"; operands[1] = gen_rtx_CONST_STRING (VOIDmode, op); diff --git a/gcc/testsuite/gcc.target/avr/torture/pr116407-2.c b/gcc/testsuite/gcc.target/avr/torture/pr116407-2.c new file mode 100644 index 000..f580d129b5b --- /dev/null +++ b/gcc/testsuite/gcc.target/avr/torture/pr116407-2.c @@ -0,0 +1,34 @@ +/* { dg-do link } */ + +typedef __UINT16_TYP
Re: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC]
On 8/18/24 1:13 AM, pan2...@intel.com wrote: From: Pan Li Move the run test of pr116278 to c-torture and leave the risc-v the asm check under risc-v part. PR target/116278 gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run test. * gcc.target/riscv/pr116278-run-2.c: Ditto. * gcc.c-torture/execute/pr116278-run-1.c: New test. * gcc.c-torture/execute/pr116278-run-2.c: New test. We should be using the dg-torture framework, so the right directory for the test is gcc.dg/torture. I suspect these tests (just based on the constants that appear) may not work on the 16 bit integer targets. So we may need /* { dg-require-effective-target int32 } */ But I don't mind faulting that in if/when we see the 16bit int targets complain. So OK in the right directory (gcc.dg/torture). Jeff
Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
On 8/18/24 12:10 AM, pan2...@intel.com wrote: From: Pan Li This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 2. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ {\ WT max = (WT)(NT)-1; \ return x > max ? (NT) max : (NT)x; \ } QUAD: DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-10.c: New test. * gcc.target/riscv/sat_u_trunc-11.c: New test. * gcc.target/riscv/sat_u_trunc-12.c: New test. * gcc.target/riscv/sat_u_trunc-run-10.c: New test. * gcc.target/riscv/sat_u_trunc-run-11.c: New test. * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Looks like they're failing in the upstream pre-commit tester: https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 jeff
[patch,avr,applied] Tweak 16-bit additions with constant
The 16-bit additions like addhi3 have two forms: One with a scratch:QI and one without, where the latter is required because reload cannot deal with a scratch when spill code pops a 16-bit addition. Passes like combine and fwprop1 may come up with the non-scratch version, which is sub-optimal in the case when the addition is performed in a NO_LD_REGS register because the operands will be spilled to LD_REGS. Having a scratch:QI at disposal can lead to better code with less spills. Johann AVR: Tweak 16-bit addition with const that didn't get a LD_REGS register. The 16-bit additions like addhi3 have two forms: One with a scratch:QI and one without, where the latter is required because reload cannot deal with a scratch when spill code pops a 16-bit addition. Passes like combine and fwprop1 may come up with the non-scratch version, which is sub-optimal in the case when the addition is performed in a NO_LD_REGS register because the operands will be spilled to LD_REGS. Having a scratch:QI at disposal can lead to better code with less spills. gcc/ * config/avr/avr.md (*add3_split) [!reload_completed]: Add a scratch:QI to 16-bit additions with constant. diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md index 57f4a08c58c..c10709ecef0 100644 --- a/gcc/config/avr/avr.md +++ b/gcc/config/avr/avr.md @@ -1724,12 +1724,28 @@ (define_insn_and_split "*add3_split" (match_operand:ALL2 2 "nonmemory_or_const_operand" "r,s,IJ YIJ,n Ynn")))] "" "#" - "&& reload_completed" + "&& 1" [(parallel [(set (match_dup 0) (plus:ALL2 (match_dup 1) (match_dup 2))) (clobber (reg:CC REG_CC))])] - "" + { +// Passes like combine and fwprop1 may remove the scratch from an +// addhi3 insn. Add the scratch again because having a QImode +// scratch reg available is better than spilling the operands in +// the case when we don't get a d-regs register. +if (! reload_completed +&& const_operand (operands[2], mode) +&& ! stack_register_operand (operands[0], HImode) +&& ! stack_register_operand (operands[1], HImode)) + { +emit (gen_add3_clobber (operands[0], operands[1], operands[2])); +DONE; + } + +if (! reload_completed) + FAIL; + } [(set_attr "isa" "*,*,adiw,*")]) ;; "*addhi3"
Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg
On 8/12/24 3:50 PM, Jeff Law wrote: On 8/12/24 1:49 PM, Richard Sandiford wrote: - regno = subreg_regno (x); + /* A paradoxical should always be REGNO (y) + 0. Using subreg_regno + for something like (subreg:DI (reg:SI N) 0) on a WORDS_BIG_ENDIAN + target will return N-1 which is catastrophic for N == 0 and just + wrong for other cases. + + Fixing subreg_regno would be a better option, except that reload + depends on its current behavior. */ + if (paradoxical_subreg_p (x)) + regno = REGNO (y); + else + regno = subreg_regno (x); Are you sure that's right? For a 32-bit big-endian target, (subreg:DI (reg:SI 1) 0) really should simplify to (reg:DI 0) rather than (reg:DI 1). Correct, we want to get (reg:DI 0). We get "0" back from REGNO (y). And we get 0 back from byte_lowpart_offset (remember, it's paradoxical). The sum is 0 resulting in (reg:DI 0). So rewinding this discussion a bit. Focusing on this insn: (insn 77 75 80 6 (parallel [ (set (reg:DI 75 [ _32 ]) (plus:DI (reg:DI 73 [ _31 ]) (subreg:DI (reg/v:SI 41 [ __n ]) 0))) (clobber (scratch:SI)) ]) "j.C":50:38 discrim 1 155 {adddi3} (expr_list:REG_DEAD (reg:DI 73 [ _31 ]) (expr_list:REG_DEAD (reg/v:SI 41 [ __n ]) (nil Not surprisingly we're focused on the subreg expression in there. The first checkpoint in my mind is IRA's allocation where we assign it to reg 0. Popping a0(r41,l0) -- assign reg 0 So given the use inside a paradoxical subreg, do we consider this valid? After the discussion from last week, I'm leaning a bit more towards no than before. Let's take a simpler case, the meaning of: (subreg:DI (reg:SI 1) 0) Actually refers to d0, not d1 on the m68k. If we agree on that, then (subreg:DI (reg:SI 0) 0) Logically makes no sense since we can't reference register -1. Note that Georg has a roughly similar looking issue on the avr, but at the high end of the register file (little endian target) which would roughly correspond to the discussion we had last week about paradoxicals on little endian targets. In both cases I'm thinking now that the problem is really a failure to properly define HARD_REGNO_MODE_OK, particularly around boundary conditions where the multi-reg mode either can't be represented or crosses into other physical registers that have fixed uses. Jeff
Re: [PATCH, gfortran] libgfortran: implement fpu-macppc for Darwin, support IEEE arithmetic
Thanks Sergey, I have pushed the patch at https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=1cfe4a4d0d4447b364815d5e5c889deb2e533669 FX
Re: [PATCH] PHIOPT: move factor_out_conditional_operation over to use gimple_match_op
On 8/16/24 8:13 PM, Andrew Pinski wrote: To start working on more with expressions with more than one operand, converting over to use gimple_match_op is needed. The added side-effect here is factor_out_conditional_operation can now support builtins/internal calls that has one operand without any extra code added. Note on the changed testcases: * pr87007-5.c: the test was testing testing for avoiding partial register stalls for the sqrt and making sure there is only one zero of the register before the branch, the phiopt would now merge the sqrt's so disable phiopt. Bootstrapped and tested on x86_64-linux-gnu with no regressions. gcc/ChangeLog: * gimple-match-exports.cc (gimple_match_op::operands_occurs_in_abnormal_phi): New function. * gimple-match.h (gimple_match_op): Add operands_occurs_in_abnormal_phi. * tree-ssa-phiopt.cc (factor_out_conditional_operation): Use gimple_match_op instead of manually extracting from/creating the gimple. gcc/testsuite/ChangeLog: * gcc.target/i386/pr87007-5.c: Disable phi-opt. diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c b/gcc/testsuite/gcc.target/i386/pr87007-5.c index 8f2dc947f6c..1a240adef63 100644 --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c @@ -1,8 +1,11 @@ /* { dg-do compile } */ -/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized" } */ +/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized -fno-ssa-phiopt" } */ /* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt are sunk out of the loop and the loop is elided. One vsqrtsd with memory operand needs a xor to avoid partial dependence. */ +/* Phi-OPT needs to ne disabled otherwise, sqrt calls are merged which is better + but we are testing to make sure the partial register stall for SSE is still avoided + for sqrts. */ Nit. s/to ne/to be/g OK with the nit fixed. Note this is getting closer to doing generalized sinking a common op through PHI nodes which is something we've wanted for a long time. Jeff
Re: [committed][rtl-optimization/116244] Don't create bogus regs in alter_subreg
On 8/18/24 10:40 AM, Jeff Law wrote: After the discussion from last week, I'm leaning a bit more towards no than before. Let's take a simpler case, the meaning of: (subreg:DI (reg:SI 1) 0) Actually refers to d0, not d1 on the m68k. If we agree on that, then (subreg:DI (reg:SI 0) 0) Logically makes no sense since we can't reference register -1. Note that Georg has a roughly similar looking issue on the avr, but at the high end of the register file (little endian target) which would roughly correspond to the discussion we had last week about paradoxicals on little endian targets. In both cases I'm thinking now that the problem is really a failure to properly define HARD_REGNO_MODE_OK, particularly around boundary conditions where the multi-reg mode either can't be represented or crosses into other physical registers that have fixed uses. The alternative is to consider that subreg expression after IRA and that reload/LRA is expected to clean it up. It's a legitimate argument and in some ways reload is already expecting to do this cleanup. jeff
[PATCH] testsuite: Prune compilation messages for modules tests
As noticed when verifying the dejagnu fix. Tested cris-elf with a new newlib that arranges to emit the mentioned warning, with/without the update in dejagnu to handle the miniscule "in". Ok to commit? -- >8 -- All testsuite compiler-calls pass default_target_compile in the dejagnu installation (typically /usr/share/dejagnu/target.exp) which also calls the dejagnu-installed prune_warnings. Normally, tests using the dg framework (most or all tests these days) compile and link by calling various wrappers that end up calling dg-test in the dejagnu installation, typically installed as /usr/share/dejagnu/dg.exp. That, besides the compiler call, also calls ${tool}-dg-prune (g++-dg-prune) on the messages, which in turn ends up calling prune_gcc_output in gcc/testsuite/lib/prune.exp. That gcc-specific "pruning" function handles more cases than the dejagnu prune_warnings, and also has updated patterns. But, module_do_it in modules.exp calls the lower-level ${tool}_target_compile "directly", i.e. g++_target_compile defined in gcc/testsuite/lib/g++.exp. That does not call ${tool}-dg-prune, meaning those test-cases miss the gcc-specific pruning. Noticed while testing a dejagnu update that handled the miniscule "in" in the warning (line-breaks added below besides the original one after "(void*)':") "/path/to/cris-elf/bin/ld: /gccobj/cris-elf/./libstdc++-v3/src/.libs/libstdc++.a(random.o): in function `std::(anonymous namespace)::__libc_getentropy(void*)': /gccsrc/libstdc++-v3/src/c++11/random.cc:183: warning: _getentropy is not implemented and will always fail" The line saying "in function" rather than "In function" (from the binutils linker since 2018) is pruned by prune_gcc_output. The prune_warnings in dejagnu-1.6.3 and earlier handles the second line separately. It's an unfortunate wart that neither consumes the delimiting line-break, leaving to the callers to prune residual empty lines. See prune_warnings in dejagnu (default_target_compile and dg-test) for those other line-break fixups, as alluded in the comment. * g++.dg/modules/modules.exp (module_do_it): Prune compilation messages. --- gcc/testsuite/g++.dg/modules/modules.exp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/gcc/testsuite/g++.dg/modules/modules.exp b/gcc/testsuite/g++.dg/modules/modules.exp index 3e8df9b89309..e6bf28d8b1a0 100644 --- a/gcc/testsuite/g++.dg/modules/modules.exp +++ b/gcc/testsuite/g++.dg/modules/modules.exp @@ -205,9 +205,19 @@ proc module_do_it { do_what testcase std asm_list } { if { !$ok } { unresolved "$ident link" } else { + global target_triplet set out [${tool}_target_compile $asm_list \ $execname executable $options] eval $xfail + + # Do gcc-specific pruning. + set out [${tool}-dg-prune $target_triplet $out] + # Fix up remaining line-breaks similar to "regular" pruning + # calls. Otherwise, a multi-line message stripped e.g. one + # part by the default prune_warnings and one part part by the + # gcc prune_gcc_output will have a residual line-break. + regsub "^\[\r\n\]+" $out "" out + if { $out == "" } { pass "$ident link" } else { -- 2.30.2
Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days
On Sun, Aug 18, 2024 at 4:52 AM Gerald Pfeifer wrote: > > When I updated one of the links yesterday I noticed we have this obsolete > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years > ago. > > I suggest we remove these. > Instead of just removing it, I wonder if it might be worthwhile to just bump the version numbers to something more recent? What's the current minimum version of binutils that libstdc++ requires? > Okay? > > Gerald > > > libstdc++-v3: > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1 > days. > * doc/html/manual/setup.html: Regenerate. > > diff --git a/libstdc++-v3/doc/html/manual/setup.html > b/libstdc++-v3/doc/html/manual/setup.html > index 78d2a00c50a..d8c5ff65cff 100644 > --- a/libstdc++-v3/doc/html/manual/setup.html > +++ b/libstdc++-v3/doc/html/manual/setup.html > @@ -29,10 +29,7 @@ > the tools you will need if you wish to modify the source. > > Additional data is given here only where it applies to libstdc++. > - As of GCC 4.0.1 the minimum version of binutils required to build > - libstdc++ is 2.15.90.0.1.1. > - Older releases of libstdc++ do not require such a recent version, > - but to take full advantage of useful space-saving features and > + To take full advantage of useful space-saving features and >bug-fixes you should use a recent binutils whenever possible. >The configure process will automatically detect and use these >features if the underlying support is present. > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml > b/libstdc++-v3/doc/xml/manual/prerequisites.xml > index a3c6e732a77..0efe63bcd46 100644 > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml > @@ -25,10 +25,7 @@ > Additional data is given here only where it applies to libstdc++. > > > - As of GCC 4.0.1 the minimum version of binutils required to build > - libstdc++ is 2.15.90.0.1.1. > - Older releases of libstdc++ do not require such a recent version, > - but to take full advantage of useful space-saving features and > + To take full advantage of useful space-saving features and >bug-fixes you should use a recent binutils whenever possible. >The configure process will automatically detect and use these >features if the underlying support is present.
Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days
On Sun, Aug 18, 2024 at 3:39 PM Eric Gallager wrote: > > On Sun, Aug 18, 2024 at 4:52 AM Gerald Pfeifer wrote: > > > > When I updated one of the links yesterday I noticed we have this obsolete > > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years > > ago. > > > > I suggest we remove these. > > > > Instead of just removing it, I wonder if it might be worthwhile to > just bump the version numbers to something more recent? What's the > current minimum version of binutils that libstdc++ requires? Well considering the binutils version is also mentioned as part of the prerequisites for GCC with a newish version; I think mentioning it also (which might get out of sync) in libstdc++ manual a little over board. See https://gcc.gnu.org/install/prerequisites.html . Thanks, Andrew Pinski > > > Okay? > > > > Gerald > > > > > > libstdc++-v3: > > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1 > > days. > > * doc/html/manual/setup.html: Regenerate. > > > > diff --git a/libstdc++-v3/doc/html/manual/setup.html > > b/libstdc++-v3/doc/html/manual/setup.html > > index 78d2a00c50a..d8c5ff65cff 100644 > > --- a/libstdc++-v3/doc/html/manual/setup.html > > +++ b/libstdc++-v3/doc/html/manual/setup.html > > @@ -29,10 +29,7 @@ > > the tools you will need if you wish to modify the source. > > > > Additional data is given here only where it applies to libstdc++. > > - As of GCC 4.0.1 the minimum version of binutils required to build > > - libstdc++ is 2.15.90.0.1.1. > > - Older releases of libstdc++ do not require such a recent version, > > - but to take full advantage of useful space-saving features and > > + To take full advantage of useful space-saving features and > >bug-fixes you should use a recent binutils whenever possible. > >The configure process will automatically detect and use these > >features if the underlying support is present. > > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml > > b/libstdc++-v3/doc/xml/manual/prerequisites.xml > > index a3c6e732a77..0efe63bcd46 100644 > > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml > > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml > > @@ -25,10 +25,7 @@ > > Additional data is given here only where it applies to libstdc++. > > > > > > - As of GCC 4.0.1 the minimum version of binutils required to build > > - libstdc++ is 2.15.90.0.1.1. > > - Older releases of libstdc++ do not require such a recent version, > > - but to take full advantage of useful space-saving features and > > + To take full advantage of useful space-saving features and > >bug-fixes you should use a recent binutils whenever possible. > >The configure process will automatically detect and use these > >features if the underlying support is present.
Re: [PATCH] libstdc++: Remove note from the GCC 4.0.1 days
On Sun, Aug 18, 2024 at 3:42 PM Andrew Pinski wrote: > > On Sun, Aug 18, 2024 at 3:39 PM Eric Gallager wrote: > > > > On Sun, Aug 18, 2024 at 4:52 AM Gerald Pfeifer wrote: > > > > > > When I updated one of the links yesterday I noticed we have this obsolete > > > reference to GCC 4.0.1 and binutils 2.15.90.0.1.1 from 19 (nineteen) years > > > ago. > > > > > > I suggest we remove these. > > > > > > > Instead of just removing it, I wonder if it might be worthwhile to > > just bump the version numbers to something more recent? What's the > > current minimum version of binutils that libstdc++ requires? > > Well considering the binutils version is also mentioned as part of the > prerequisites for GCC with a newish version; I think mentioning it > also (which might get out of sync) in libstdc++ manual a little over > board. > See https://gcc.gnu.org/install/prerequisites.html . Looks like most of the versions mentioned in https://gcc.gnu.org/install/specific.html need to be updated to at least the version that was mentioned in libstdc++'s manual. hppa*-hp-hpux11, i?86-*-linux*, and sparc-sun-solaris2* all mention versions older than 2.15.9. At least https://gcc.gnu.org/install/prerequisites.html recommends 2.35+ (due to LTO requirements). Thanks, Andrew > > Thanks, > Andrew Pinski > > > > > > Okay? > > > > > > Gerald > > > > > > > > > libstdc++-v3: > > > * doc/xml/manual/prerequisites.xml: Remove note from the GCC 4.0.1 > > > days. > > > * doc/html/manual/setup.html: Regenerate. > > > > > > diff --git a/libstdc++-v3/doc/html/manual/setup.html > > > b/libstdc++-v3/doc/html/manual/setup.html > > > index 78d2a00c50a..d8c5ff65cff 100644 > > > --- a/libstdc++-v3/doc/html/manual/setup.html > > > +++ b/libstdc++-v3/doc/html/manual/setup.html > > > @@ -29,10 +29,7 @@ > > > the tools you will need if you wish to modify the source. > > > > > > Additional data is given here only where it applies to libstdc++. > > > - As of GCC 4.0.1 the minimum version of binutils required to > > > build > > > - libstdc++ is 2.15.90.0.1.1. > > > - Older releases of libstdc++ do not require such a recent version, > > > - but to take full advantage of useful space-saving features and > > > + To take full advantage of useful space-saving features and > > >bug-fixes you should use a recent binutils whenever possible. > > >The configure process will automatically detect and use these > > >features if the underlying support is present. > > > diff --git a/libstdc++-v3/doc/xml/manual/prerequisites.xml > > > b/libstdc++-v3/doc/xml/manual/prerequisites.xml > > > index a3c6e732a77..0efe63bcd46 100644 > > > --- a/libstdc++-v3/doc/xml/manual/prerequisites.xml > > > +++ b/libstdc++-v3/doc/xml/manual/prerequisites.xml > > > @@ -25,10 +25,7 @@ > > > Additional data is given here only where it applies to libstdc++. > > > > > > > > > - As of GCC 4.0.1 the minimum version of binutils required to > > > build > > > - libstdc++ is 2.15.90.0.1.1. > > > - Older releases of libstdc++ do not require such a recent version, > > > - but to take full advantage of useful space-saving features and > > > + To take full advantage of useful space-saving features and > > >bug-fixes you should use a recent binutils whenever possible. > > >The configure process will automatically detect and use these > > >features if the underlying support is present.
[committed][PR rtl-optimization/115876] Avoid ubsan in ext-dce.cc
This fixes two general ubsan issues in ext-dce, both related to use-side processsing of modes > DImode. In ext_dce_process_uses we can be presented with something like this as a use (subreg:SI (reg:TF) 12) That will result in an out of range shift for a HOST_WIDE_INT object. Where this happens is safe to just break from the SET context and process the subjects. This will ultimately result in seeing (reg:TF) and we'll mark all bit groups as live. In carry_backpropagate we can be presented with a TImode shift (for example) and the shift count can be > 63 for such a shift. This naturally trips ubsan as well as we're operating on 64 bit objects. We can just return mmask in this case noting that every bit group is live. The combination of these two fixes eliminates all the reported ubsan issues in ext-dce seen in a bootstrap and regression test on x86. While I was in there I went ahead and fixed the various hardcoded 63/64 values to be HOST_BITS_PER_WIDE_INT based. Bootstrapped and regression tested on x86 with no regressions. Also built with ubsan enabled and verified the build logs and testsuite logs don't call out any issues in ext-dce anymore. Pushing to the trunk. Jeffcommit f10d2ee95356b9de6c44d701c4dfa8fb088714d2 Author: Jeff Law Date: Sun Aug 18 16:55:52 2024 -0600 [PR rtl-optimization/115876] Avoid ubsan in ext-dce.cc This fixes two general ubsan issues in ext-dce, both related to use-side processsing of modes > DImode. In ext_dce_process_uses we can be presented with something like this as a use (subreg:SI (reg:TF) 12) That will result in an out of range shift for a HOST_WIDE_INT object. Where this happens is safe to just break from the SET context and process the subjects. This will ultimately result in seeing (reg:TF) and we'll mark all bit groups as live. In carry_backpropagate we can be presented with a TImode shift (for example) and the shift count can be > 63 for such a shift. This naturally trips ubsan as well as we're operating on 64 bit objects. We can just return mmask in this case noting that every bit group is live. The combination of these two fixes eliminates all the reported ubsan issues in ext-dce seen in a bootstrap and regression test on x86. While I was in there I went ahead and fixed the various hardcoded 63/64 values to be HOST_BITS_PER_WIDE_INT based. Bootstrapped and regression tested on x86 with no regressions. Also built with ubsan enabled and verified the build logs and testsuite logs don't call out any issues in ext-dce anymore. Pushing to the trunk. PR rtl-optimization/115876 gcc * ext-dce.cc (ext_dce_process_sets): Replace hardcoded 63/64 instances with HOST_BITS_PER_WIDE_INT based values. (carry_backpropagate): Handle modes with more bits than HOST_BITS_PER_WIDE_INT gracefully, avoiding undefined behavior. (ext_dce_process_uses): Handle subreg offsets which would result in ubsan shifts gracefully, avoiding undefined behavior. diff --git a/gcc/ext-dce.cc b/gcc/ext-dce.cc index 017e2de000d..eee9208f0d6 100644 --- a/gcc/ext-dce.cc +++ b/gcc/ext-dce.cc @@ -207,7 +207,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) wider than DImode. */ scalar_int_mode outer_mode; if (!is_a (GET_MODE (x), &outer_mode) - || GET_MODE_BITSIZE (outer_mode) > 64) + || GET_MODE_BITSIZE (outer_mode) > HOST_BITS_PER_WIDE_INT) { /* Skip the subrtxs of this destination. There is little value in iterating into the subobjects, so @@ -239,7 +239,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) that case. Remember, we can not just continue to process the inner RTXs due to the STRICT_LOW_PART. */ if (!is_a (GET_MODE (SUBREG_REG (x)), &outer_mode) - || GET_MODE_BITSIZE (outer_mode) > 64) + || GET_MODE_BITSIZE (outer_mode) > HOST_BITS_PER_WIDE_INT) { /* Skip the subrtxs of the STRICT_LOW_PART. We can't process them because it'll set objects as no longer @@ -293,7 +293,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) the top of the loop which just complicates the flow even more. */ if (!is_a (GET_MODE (SUBREG_REG (x)), &outer_mode) - || GET_MODE_BITSIZE (outer_mode) > 64) + || GET_MODE_BITSIZE (outer_mode) > HOST_BITS_PER_WIDE_INT) { skipped_dest = true; iter.skip_subrtxes (); @@ -329,7 +329,7 @@ ext_dce_process_sets (rtx_insn *insn, rtx obj, bitmap live_tmp) } /* BIT >= 64 indicates someth
Re: [PATCH] PHIOPT: move factor_out_conditional_operation over to use gimple_match_op
On Sun, Aug 18, 2024 at 11:06 AM Jeff Law wrote: > > > > On 8/16/24 8:13 PM, Andrew Pinski wrote: > > To start working on more with expressions with more than one operand, > > converting > > over to use gimple_match_op is needed. > > The added side-effect here is factor_out_conditional_operation can now > > support > > builtins/internal calls that has one operand without any extra code added. > > > > Note on the changed testcases: > > * pr87007-5.c: the test was testing testing for avoiding partial register > > stalls > > for the sqrt and making sure there is only one zero of the register before > > the > > branch, the phiopt would now merge the sqrt's so disable phiopt. > > > > Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > > > gcc/ChangeLog: > > > > * gimple-match-exports.cc > > (gimple_match_op::operands_occurs_in_abnormal_phi): > > New function. > > * gimple-match.h (gimple_match_op): Add > > operands_occurs_in_abnormal_phi. > > * tree-ssa-phiopt.cc (factor_out_conditional_operation): Use > > gimple_match_op > > instead of manually extracting from/creating the gimple. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/i386/pr87007-5.c: Disable phi-opt. > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr87007-5.c > > b/gcc/testsuite/gcc.target/i386/pr87007-5.c > > index 8f2dc947f6c..1a240adef63 100644 > > --- a/gcc/testsuite/gcc.target/i386/pr87007-5.c > > +++ b/gcc/testsuite/gcc.target/i386/pr87007-5.c > > @@ -1,8 +1,11 @@ > > /* { dg-do compile } */ > > -/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse > > -fno-tree-vectorize -fdump-tree-cddce3-details > > -fdump-tree-lsplit-optimized" } */ > > +/* { dg-options "-Ofast -march=skylake-avx512 -mfpmath=sse > > -fno-tree-vectorize -fdump-tree-cddce3-details -fdump-tree-lsplit-optimized > > -fno-ssa-phiopt" } */ > > /* Load of d2/d3 is hoisted out, the loop is split, store of d1 and sqrt > > are sunk out of the loop and the loop is elided. One vsqrtsd with > > memory operand needs a xor to avoid partial dependence. */ > > +/* Phi-OPT needs to ne disabled otherwise, sqrt calls are merged which is > > better > > + but we are testing to make sure the partial register stall for SSE is > > still avoided > > + for sqrts. */ > Nit. s/to ne/to be/g > > OK with the nit fixed. > > Note this is getting closer to doing generalized sinking a common op > through PHI nodes which is something we've wanted for a long time. Yes that is the plan; I just want to do it in steps as I have a few other projects in progress; and I don't know how much of each I will be able to get done in time for GCC 15. I originally had this done differently but I thought it would be better to reuse infrastructure that was already there instead of creating new ones. I had implemented this patch back in April and I didn't know if I could get to the rest due to other projects going on so I submitted it finally. Thanks, Andrew > > Jeff >
[PATCH 2/4] Write CodeView information about optimized stack variables
Outputs S_DEFRANGE_REGISTER_REL symbols for optimized local variables that are on the stack, consisting of the stack register, the offset, and the code range for which this applies. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_DEFRANGE_REGISTER_REL. (write_defrange_register_rel): New function. (write_optimized_local_variable_loc): Add fbloc param, and call write_defrange_register_rel. (write_optimized_local_variable): Add fbloc param. (write_optimized_function_vars): Add fbloc param. --- gcc/dwarf2codeview.cc | 128 +++--- 1 file changed, 119 insertions(+), 9 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 15253978968..74bbf6bc1d7 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -79,6 +79,7 @@ enum cv_sym_type { S_COMPILE3 = 0x113c, S_LOCAL = 0x113e, S_DEFRANGE_REGISTER = 0x1141, + S_DEFRANGE_REGISTER_REL = 0x1145, S_LPROC32_ID = 0x1146, S_GPROC32_ID = 0x1147, S_PROC_ID_END = 0x114f @@ -2409,10 +2410,113 @@ write_defrange_register (dw_loc_descr_ref expr, rtx range_start, rtx range_end) targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); } +/* Write an S_DEFRANGE_REGISTER_REL symbol, which describes a range for which + an S_LOCAL variable is held in memory given by the value of a certain + register plus an offset. */ + +static void +write_defrange_register_rel (dw_loc_descr_ref expr, dw_loc_descr_ref fbloc, +rtx range_start, rtx range_end) +{ + unsigned int label_num = ++sym_label_num; + uint16_t regno; + int offset; + + /* This is defrange_register_rel in binutils and DEFRANGESYMREGISTERREL in + Microsoft's cvinfo.h: + + struct lvar_addr_range + { + uint32_t offset; + uint16_t section; + uint16_t length; + } ATTRIBUTE_PACKED; + + struct lvar_addr_gap { + uint16_t offset; + uint16_t length; + } ATTRIBUTE_PACKED; + + struct defrange_register_rel + { + uint16_t size; + uint16_t kind; + uint16_t reg; + uint16_t offset_parent; + uint32_t offset_register; + struct lvar_addr_range range; + struct lvar_addr_gap gaps[]; + } ATTRIBUTE_PACKED; +*/ + + if (!fbloc) +return; + + if (fbloc->dw_loc_opc >= DW_OP_breg0 && fbloc->dw_loc_opc <= DW_OP_breg31) +{ + regno = dwarf_reg_to_cv (fbloc->dw_loc_opc - DW_OP_breg0); + offset = fbloc->dw_loc_oprnd1.v.val_int; +} + else if (fbloc->dw_loc_opc == DW_OP_bregx) +{ + regno = dwarf_reg_to_cv (fbloc->dw_loc_oprnd1.v.val_int); + offset = fbloc->dw_loc_oprnd2.v.val_int; +} + else +{ + return; +} + + if (expr->dw_loc_oprnd1.val_class != dw_val_class_unsigned_const) +return; + + offset += expr->dw_loc_oprnd1.v.val_int; + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, + "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n", + label_num, label_num); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, S_DEFRANGE_REGISTER_REL); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, regno); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, offset); + putc ('\n', asm_out_file); + + asm_fprintf (asm_out_file, "\t.secrel32 "); + output_addr_const (asm_out_file, range_start); + fputc ('\n', asm_out_file); + + asm_fprintf (asm_out_file, "\t.secidx "); + output_addr_const (asm_out_file, range_start); + fputc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + output_addr_const (asm_out_file, range_end); + fputs (" - ", asm_out_file); + output_addr_const (asm_out_file, range_start); + putc ('\n', asm_out_file); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); +} + /* Try to write an S_DEFRANGE_* symbol for the given DWARF location. */ static void -write_optimized_local_variable_loc (dw_loc_descr_ref expr, rtx range_start, +write_optimized_local_variable_loc (dw_loc_descr_ref expr, + dw_loc_descr_ref fbloc, rtx range_start, rtx range_end) { if (expr->dw_loc_next) @@ -2462,6 +2566,10 @@ write_optimized_local_variable_loc (dw_loc_descr_ref expr, rtx range_start, write_defrange_register (expr, range_start, range_end); break; +case DW_OP_fbreg: + write_defrange_register_rel (expr, fbloc, range_start, range_end); + break; + default: break; } @@ -2473,7 +2581,8 @@ write_optimized_local_variable_lo
[PATCH 3/4] Write CodeView S_FRAMEPROC symbols
Write S_FRAMEPROC symbols, which aren't very useful but seem to be necessary for Microsoft debuggers to function properly. These symbols come after S_LOCAL symbols for optimized variables, but before S_REGISTER and S_REGREL32 for unoptimized variables. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_FRAMEPROC. (write_s_frameproc): New function. (write_function): Call write_s_frameproc. --- gcc/dwarf2codeview.cc | 80 +-- 1 file changed, 78 insertions(+), 2 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 74bbf6bc1d7..88310504cf7 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -71,6 +71,7 @@ along with GCC; see the file COPYING3. If not see enum cv_sym_type { S_END = 0x0006, + S_FRAMEPROC = 0x1012, S_BLOCK32 = 0x1103, S_REGISTER = 0x1106, S_LDATA32 = 0x110c, @@ -2822,6 +2823,74 @@ write_s_end (void) targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); } +/* Write the S_FRAMEPROC symbol, which is supposed to give information about + the function frame. It doesn't seem to be really used in modern versions of + MSVC, which is why we zero-out everything here. You still need to write it + though, otherwise windbg won't necessarily show all the local variables. */ + +static void +write_s_frameproc (void) +{ + unsigned int label_num = ++sym_label_num; + + /* This is struct FRAMEPROCSYM in Microsoft's cvinfo.h: + + struct frameprocsym + { + uint16_t size; + uint16_t kind; + uint32_t frame_size; + uint32_t padding_size; + uint32_t padding_offset; + uint32_t saved_registers_size; + uint32_t exception_handler_offset; + uint16_t exception_handler_section; + uint32_t flags; + } ATTRIBUTE_PACKED; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, + "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n", + label_num, label_num); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, S_FRAMEPROC); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); +} + /* Loop through the DIEs in an unoptimized function, writing out any variables or blocks that we encounter. */ @@ -3070,9 +3139,16 @@ write_function (codeview_symbol *s) fbloc = frame_base->dw_attr_val.v.val_loc; if (flag_var_tracking) -write_optimized_function_vars (s->function.die, fbloc, rtx_low, rtx_high); +{ + write_optimized_function_vars (s->function.die, fbloc, rtx_low, +rtx_high); + write_s_frameproc (); +} else -write_unoptimized_function_vars (s->function.die, fbloc); +{ + write_s_frameproc (); + write_unoptimized_function_vars (s->function.die, fbloc); +} /* Output the S_PROC_ID_END record. */ -- 2.44.2
[PATCH 4/4] Write CodeView information about static locals in optimized code
Write CodeView S_LDATA32 symbols for static locals in optimized code. We have to handle these separately, as they come after the S_FRAMEPROC, plus you can't have S_BLOCK32 symbols like you can in unoptimized code. gcc/ * dwarf2codeview.cc (write_optimized_static_local_vars): New function. (write_function): Call write_optimized_static_local_vars. --- gcc/dwarf2codeview.cc | 57 +++ 1 file changed, 57 insertions(+) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index 88310504cf7..e4c67f921cd 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -3007,6 +3007,62 @@ write_optimized_function_vars (dw_die_ref die, dw_loc_descr_ref fbloc, while (c != first_child); } +/* There's no way to mark the range of a static local variable in an optimized + function: there's no S_DEFRANGE_* symbol for this, and you can't have + S_BLOCK32 symbols. So instead we have to loop through after the S_FRAMEPROC + has been written, and write the S_LDATA32s at the end. */ + +static void +write_optimized_static_local_vars (dw_die_ref die) +{ + dw_die_ref first_child, c; + + first_child = dw_get_die_child (die); + + if (!first_child) +return; + + c = first_child; + do + { +c = dw_get_die_sib (c); + +switch (dw_get_die_tag (c)) + { + case DW_TAG_variable: + { + dw_attr_node *loc; + dw_loc_descr_ref loc_ref; + + loc = get_AT (c, DW_AT_location); + if (!loc) + break; + + if (loc->dw_attr_val.val_class != dw_val_class_loc) + break; + + loc_ref = loc->dw_attr_val.v.val_loc; + if (!loc_ref) + break; + + if (loc_ref->dw_loc_opc != DW_OP_addr) + break; + + write_local_s_ldata32 (c, loc_ref); + break; + } + + case DW_TAG_lexical_block: + write_optimized_static_local_vars (c); + break; + + default: + break; + } + } + while (c != first_child); +} + /* Write an S_GPROC32_ID symbol, representing a global function, or an S_LPROC32_ID symbol, for a static function. */ @@ -3143,6 +3199,7 @@ write_function (codeview_symbol *s) write_optimized_function_vars (s->function.die, fbloc, rtx_low, rtx_high); write_s_frameproc (); + write_optimized_static_local_vars (s->function.die); } else { -- 2.44.2
[PATCH 1/4] Write CodeView information about enregistered optimized variables
Enable variable tracking when outputting CodeView debug information, and make it so that we issue debug symbols for optimized variables in registers. This consists of S_LOCAL symbols, which give the name and the type of local variables, followed by S_DEFRANGE_REGISTER symbols for the register and the code for which this applies. gcc/ * dwarf2codeview.cc (enum cv_sym_type): Add S_LOCAL and S_DEFRANGE_REGISTER. (write_s_local): New function. (write_defrange_register): New function. (write_optimized_local_variable_loc): New function. (write_optimized_local_variable): New function. (write_optimized_function_vars): New function. (write_function): Call write_optimized_function_vars if variable tracking enabled. * dwarf2out.cc (typedef var_loc_view): Move to dwarf2out.h. (struct dw_loc_list_struct): Likewise. * dwarf2out.h (typedef var_loc_view): Move from dwarf2out.cc. (struct dw_loc_list_struct): Likewise. * opts.cc (finish_options): Enable variable tracking for CodeView. --- gcc/dwarf2codeview.cc | 316 +- gcc/dwarf2out.cc | 37 - gcc/dwarf2out.h | 37 + gcc/opts.cc | 2 +- 4 files changed, 353 insertions(+), 39 deletions(-) diff --git a/gcc/dwarf2codeview.cc b/gcc/dwarf2codeview.cc index e01515a0ec4..15253978968 100644 --- a/gcc/dwarf2codeview.cc +++ b/gcc/dwarf2codeview.cc @@ -77,6 +77,8 @@ enum cv_sym_type { S_GDATA32 = 0x110d, S_REGREL32 = 0x, S_COMPILE3 = 0x113c, + S_LOCAL = 0x113e, + S_DEFRANGE_REGISTER = 0x1141, S_LPROC32_ID = 0x1146, S_GPROC32_ID = 0x1147, S_PROC_ID_END = 0x114f @@ -1946,6 +1948,56 @@ end: free (s->data_symbol.name); } +/* Write an S_LOCAL symbol, representing an optimized variable. This is then + followed by various S_DEFRANGE_* symbols, which describe how to find the + value of a variable and the range for which this is valid. */ + +static void +write_s_local (dw_die_ref die) +{ + unsigned int label_num = ++sym_label_num; + const char *name = get_AT_string (die, DW_AT_name); + uint32_t type; + + /* This is struct LOCALSYM in Microsoft's cvinfo.h: + +struct LOCALSYM { + uint16_t reclen; + uint16_t rectyp; + uint32_t typind; + uint16_t flags; + char name[]; +}; + */ + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, + "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n", + label_num, label_num); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_LABEL, label_num); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, S_LOCAL); + putc ('\n', asm_out_file); + + type = get_type_num (get_AT_ref (die, DW_AT_type), false, false); + + fputs (integer_asm_op (4, false), asm_out_file); + fprint_whex (asm_out_file, type); + putc ('\n', asm_out_file); + + fputs (integer_asm_op (2, false), asm_out_file); + fprint_whex (asm_out_file, 0); + putc ('\n', asm_out_file); + + ASM_OUTPUT_ASCII (asm_out_file, name, strlen (name) + 1); + + ASM_OUTPUT_ALIGN (asm_out_file, 2); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); +} + /* Write an S_LDATA32 symbol, representing a static variable within a function. This symbol can also appear outside of a function block - see write_data_symbol. */ @@ -2278,6 +2330,194 @@ write_fbreg_variable (dw_die_ref die, dw_loc_descr_ref loc_ref, targetm.asm_out.internal_label (asm_out_file, SYMBOL_END_LABEL, label_num); } +/* Write an S_DEFRANGE_REGISTER symbol, which describes a range for which an + S_LOCAL variable is held in a certain register. */ + +static void +write_defrange_register (dw_loc_descr_ref expr, rtx range_start, rtx range_end) +{ + unsigned int label_num = ++sym_label_num; + uint16_t regno; + + /* This is defrange_register in binutils and DEFRANGESYMREGISTER in + Microsoft's cvinfo.h: + + struct lvar_addr_range + { + uint32_t offset; + uint16_t section; + uint16_t length; + } ATTRIBUTE_PACKED; + + struct lvar_addr_gap { + uint16_t offset; + uint16_t length; + } ATTRIBUTE_PACKED; + + struct defrange_register + { + uint16_t size; + uint16_t kind; + uint16_t reg; + uint16_t attributes; + struct lvar_addr_range range; + struct lvar_addr_gap gaps[]; + } ATTRIBUTE_PACKED; + */ + + if (expr->dw_loc_opc == DW_OP_regx) +regno = dwarf_reg_to_cv (expr->dw_loc_oprnd1.v.val_int); + else +regno = dwarf_reg_to_cv (expr->dw_loc_opc - DW_OP_reg0); + + if (regno == 0) +return; + + fputs (integer_asm_op (2, false), asm_out_file); + asm_fprintf (asm_out_file, + "%L" SYMBOL_END_LABEL "%u - %L" SYMBOL_START_LABEL "%u\n", + label_num, label_num); + + targetm.asm_out.internal_label (asm_out_file, SYMBOL_START_L
RE: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC]
Sure, will send v2 for this. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:19 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: richard.guent...@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com; s...@gentoo.org Subject: Re: [PATCH v1] Test: Move pr116278 run test to c-torture [NFC] On 8/18/24 1:13 AM, pan2...@intel.com wrote: > From: Pan Li > > Move the run test of pr116278 to c-torture and leave the risc-v the > asm check under risc-v part. > > PR target/116278 > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/pr116278-run-1.c: Take compile instead of > run test. > * gcc.target/riscv/pr116278-run-2.c: Ditto. > * gcc.c-torture/execute/pr116278-run-1.c: New test. > * gcc.c-torture/execute/pr116278-run-2.c: New test. We should be using the dg-torture framework, so the right directory for the test is gcc.dg/torture. I suspect these tests (just based on the constants that appear) may not work on the 16 bit integer targets. So we may need /* { dg-require-effective-target int32 } */ But I don't mind faulting that in if/when we see the 16bit int targets complain. So OK in the right directory (gcc.dg/torture). Jeff
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Opps, let me double check what happened to my local tester. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:21 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 On 8/18/24 12:10 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add test cases for the unsigned scalar quad and > oct .SAT_TRUNC form 2. Aka: > > Form 2: >#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >NT __attribute__((noinline)) \ >sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >{\ > WT max = (WT)(NT)-1; \ > return x > max ? (NT) max : (NT)x; \ >} > > QUAD: > DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) > > OCT: > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) > > The below test is passed for this patch. > * The rv64gcv regression test. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_trunc-10.c: New test. > * gcc.target/riscv/sat_u_trunc-11.c: New test. > * gcc.target/riscv/sat_u_trunc-12.c: New test. > * gcc.target/riscv/sat_u_trunc-run-10.c: New test. > * gcc.target/riscv/sat_u_trunc-run-11.c: New test. > * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Looks like they're failing in the upstream pre-commit tester: > https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 jeff
Re: [PATCH 00/22] Support AVX10.2 ymm rounding
On Wed, Aug 14, 2024 at 5:07 PM Haochen Jiang wrote: > > Hi all, > > The initial patch for AVX10.2 has been merged this week. > > For the upcoming patches, we will first upstream ymm rounding control part. > > In ymm rounding part, ALL the instructions in AVX512 with 512-bit rounding > control will also have 256-bit rounding control in AVX10.2. > > For clearness, the patch order is based on alphabetical order. Each patch > will include its intrin definition and related tests. Sometimes pattern is > not changed in the patch because the previous change in the patch series > has already enabled the 256 bit rounding in the pattern. > > Bootstrapped on x86-64-pc-linux-gnu. Ok for trunk? Ok for all 22 patches in the thread. > > Thx, > Haochen > > Ref: Intel Advanced Vector Extensions 10.2 Architecture Specification > https://cdrdv2.intel.com/v1/dl/getContent/828965 > > -- BR, Hongtao
[PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
From: Pan Li This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 2. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ {\ WT max = (WT)(NT)-1; \ return x > max ? (NT) max : (NT)x; \ } QUAD: DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-10.c: New test. * gcc.target/riscv/sat_u_trunc-11.c: New test. * gcc.target/riscv/sat_u_trunc-12.c: New test. * gcc.target/riscv/sat_u_trunc-run-10.c: New test. * gcc.target/riscv/sat_u_trunc-run-11.c: New test. * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Signed-off-by: Pan Li --- .../gcc.target/riscv/sat_u_trunc-10.c | 17 .../gcc.target/riscv/sat_u_trunc-11.c | 17 .../gcc.target/riscv/sat_u_trunc-12.c | 20 +++ .../gcc.target/riscv/sat_u_trunc-run-10.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-11.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-12.c | 16 +++ 6 files changed, 102 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-12.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c new file mode 100644 index 000..7dfc740c54f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint32_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint32_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c new file mode 100644 index 000..c50ae96f47d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c new file mode 100644 index 000..61331cee6fa --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint16_t_fmt_2: +** li\s+[atx][0-9]+,\s*65536 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** sltu\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+ +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** slli\s+a0,\s*a0,\s*48 +** srli\s+a0,\s*a0,\s*48 +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint16_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c new file mode 100644 index 000..4bc9303e457 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c @@ -0,0 +1,16 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99" } */ + +#include "sat_arith.h" +#include "sat_arith_data.h" + +#define T1 uint8_t +#define T2 uint32_t + +DEF_SAT_U_TRUC_FMT_2_WRAP(T1, T2) + +#define DATA TEST_UNARY_DATA_WRAP(T1, T2) +#define T
[PATCH v2] Test: Move pr116278 run test to dg/torture [NFC]
From: Pan Li Move the run test of pr116278 to dg/torture and leave the risc-v the asm check under risc-v part. PR target/116278 gcc/testsuite/ChangeLog: * gcc.target/riscv/pr116278-run-1.c: Take compile instead of run. * gcc.target/riscv/pr116278-run-2.c: Ditto. * gcc.dg/torture/pr116278-run-1.c: New test. * gcc.dg/torture/pr116278-run-2.c: New test. Signed-off-by: Pan Li --- gcc/testsuite/gcc.dg/torture/pr116278-run-1.c | 19 +++ gcc/testsuite/gcc.dg/torture/pr116278-run-2.c | 19 +++ .../gcc.target/riscv/pr116278-run-1.c | 2 +- .../gcc.target/riscv/pr116278-run-2.c | 2 +- 4 files changed, 40 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-1.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr116278-run-2.c diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c new file mode 100644 index 000..8e07fb6af29 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-1.c @@ -0,0 +1,19 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32 } */ +/* { dg-options "-O2" } */ + +#include + +int8_t b[1]; +int8_t *d = b; +int32_t c; + +int main() { + b[0] = -40; + uint16_t t = (uint16_t)d[0]; + + c = (t < 0xFFF6 ? t : 0xFFF6) + 9; + + if (c != 65505) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c new file mode 100644 index 000..d85e21531e1 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr116278-run-2.c @@ -0,0 +1,19 @@ +/* { dg-do run } */ +/* { dg-require-effective-target int32 } */ +/* { dg-options "-O2" } */ + +#include + +int16_t b[1]; +int16_t *d = b; +int64_t c; + +int main() { + b[0] = -40; + uint32_t t = (uint32_t)d[0]; + + c = (t < 0xFFF6u ? t : 0xFFF6u) + 9; + + if (c != 4294967265) +__builtin_abort (); +} diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c index d3812bdcdfb..c758fca7975 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-1.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include diff --git a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c index 669cd4f003f..a4da8a323f0 100644 --- a/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c +++ b/gcc/testsuite/gcc.target/riscv/pr116278-run-2.c @@ -1,4 +1,4 @@ -/* { dg-do run { target { riscv_v } } } */ +/* { dg-do compile } */ /* { dg-options "-O2 -fdump-rtl-expand-details" } */ #include -- 2.43.0
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Please ignore this patch, should be sent by mistake. Pan -Original Message- From: Li, Pan2 Sent: Monday, August 19, 2024 10:04 AM To: gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp@gmail.com; Li, Pan2 Subject: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 From: Pan Li This patch would like to add test cases for the unsigned scalar quad and oct .SAT_TRUNC form 2. Aka: Form 2: #define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ NT __attribute__((noinline)) \ sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ {\ WT max = (WT)(NT)-1; \ return x > max ? (NT) max : (NT)x; \ } QUAD: DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) OCT: DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) The below test is passed for this patch. * The rv64gcv regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_u_trunc-10.c: New test. * gcc.target/riscv/sat_u_trunc-11.c: New test. * gcc.target/riscv/sat_u_trunc-12.c: New test. * gcc.target/riscv/sat_u_trunc-run-10.c: New test. * gcc.target/riscv/sat_u_trunc-run-11.c: New test. * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Signed-off-by: Pan Li --- .../gcc.target/riscv/sat_u_trunc-10.c | 17 .../gcc.target/riscv/sat_u_trunc-11.c | 17 .../gcc.target/riscv/sat_u_trunc-12.c | 20 +++ .../gcc.target/riscv/sat_u_trunc-run-10.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-11.c | 16 +++ .../gcc.target/riscv/sat_u_trunc-run-12.c | 16 +++ 6 files changed, 102 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-11.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-12.c diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c new file mode 100644 index 000..7dfc740c54f --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-10.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint32_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint32_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c new file mode 100644 index 000..c50ae96f47d --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-11.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint8_t_fmt_2: +** sltiu\s+[atx][0-9]+,\s*a0,\s*255 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** andi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*0xff +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint8_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c new file mode 100644 index 000..61331cee6fa --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-12.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gc -mabi=lp64d -O3 -fdump-rtl-expand-details -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include "sat_arith.h" + +/* +** sat_u_truc_uint64_t_to_uint16_t_fmt_2: +** li\s+[atx][0-9]+,\s*65536 +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** sltu\s+[atx][0-9]+,\s*a0,\s*[atx][0-9]+ +** addi\s+[atx][0-9]+,\s*[atx][0-9]+,\s*-1 +** or\s+[atx][0-9]+,\s*[atx][0-9]+,\s*[atx][0-9]+ +** slli\s+a0,\s*a0,\s*48 +** srli\s+a0,\s*a0,\s*48 +** ret +*/ +DEF_SAT_U_TRUC_FMT_2(uint16_t, uint64_t) + +/* { dg-final { scan-rtl-dump-times ".SAT_TRUNC " 2 "expand" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c b/gcc/testsuite/gcc.target/riscv/sat_u_trunc-run-10.c new file mode 100644 index 000..4bc9303e457 --- /dev/n
[PATCHv4, expand] Add const0 move checking for CLEAR_BY_PIECES optabs
Hi, This patch adds const0 move checking for CLEAR_BY_PIECES. The original vec_duplicate handles duplicates of non-constant inputs. But 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move by that mode. Compared to the previous version, the main change is to set up a new function to generate const0 for certain modes and use the function as by_pieces_constfn for CLEAR_BY_PIECES. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. On i386, it got several regressions. One issue is the predicate of V16QI move expand doesn't include const0. Thus V16QI mode can't be used for clear by pieces with the patch. The second issue is the const0 is passed directly to the move expand with the patch. Originally it is forced to a pseudo and i386 can leverage the previous data to do optimization. The patch also raises several regressions on aarch64. The V2x8QImode replaces TImode to do 16-byte clear by pieces as V2x8QImode move expand supports const0 and vector mode is preferable. I drafted a patch to address the issue. It will be sent for review in a separate email. Another problem is V8QImode replaces DImode to do 8-byte clear by pieces. It seems cause different sequences of instructions but the actually instructions are the same. Thanks Gui Haochen ChangeLog expand: Add const0 move checking for CLEAR_BY_PIECES optabs vec_duplicate handles duplicates of non-constant inputs. The 0 is a constant. So even a platform doesn't support vec_duplicate, it could still do clear by pieces if it supports const0 move. This patch adds the checking. gcc/ * expr.cc (by_pieces_mode_supported_p): Add const0 move checking for CLEAR_BY_PIECES. (set_zero): New. (clear_by_pieces): Pass set_zero as by_pieces_constfn. patch.diff diff --git a/gcc/expr.cc b/gcc/expr.cc index ffbac513692..7199e0956f8 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -1014,14 +1014,20 @@ can_use_qi_vectors (by_pieces_operation op) static bool by_pieces_mode_supported_p (fixed_size_mode mode, by_pieces_operation op) { - if (optab_handler (mov_optab, mode) == CODE_FOR_nothing) + enum insn_code icode = optab_handler (mov_optab, mode); + if (icode == CODE_FOR_nothing) return false; - if ((op == SET_BY_PIECES || op == CLEAR_BY_PIECES) + if (op == SET_BY_PIECES && VECTOR_MODE_P (mode) && optab_handler (vec_duplicate_optab, mode) == CODE_FOR_nothing) return false; + if (op == CLEAR_BY_PIECES + && VECTOR_MODE_P (mode) + && !insn_operand_matches (icode, 1, CONST0_RTX (mode))) + return false; + if (op == COMPARE_BY_PIECES && !can_compare_p (EQ, mode, ccp_jump)) return false; @@ -1840,16 +1846,20 @@ store_by_pieces (rtx to, unsigned HOST_WIDE_INT len, return to; } +static rtx +set_zero (void *, void *, HOST_WIDE_INT, fixed_size_mode mode) +{ + return CONST0_RTX (mode); +} + void clear_by_pieces (rtx to, unsigned HOST_WIDE_INT len, unsigned int align) { if (len == 0) return; - /* Use builtin_memset_read_str to support vector mode broadcast. */ - char c = 0; - store_by_pieces_d data (to, builtin_memset_read_str, &c, len, align, - CLEAR_BY_PIECES); + /* Use set_zero to generate const0 of centain mode. */ + store_by_pieces_d data (to, set_zero, NULL, len, align, CLEAR_BY_PIECES); data.run (); }
Re: [PATCHv2, aarch64] Implement 16-byte vector mode const0 store by TImode
No regressions are reported. Committed as r15-3013. https://gcc.gnu.org/pipermail/gcc-cvs/2024-August/408072.html Thanks Gui Haochen 在 2024/8/16 10:31, HAO CHEN GUI 写道: > Hi, > I submitted a patch to change the mode checking for > CLEAR_BY_PIECES. > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660344.html > > It causes some regressions on aarch64. With the patch, > V2x8QImode is used to do clear by pieces instead of TImode as > vector mode is preferable and V2x8QImode supports const0 store. > Thus the efficient "stp" instructions can't be generated. > > I drafted following patch to fix the problem. It can fix > regressions found in memset-corner-cases.c, memset-q-reg.c, > auto-init-padding-11.c and auto-init-padding-5.c. > > Compared to previous one, the main changes are > 1. Support all 16-byte vector modes > 2. Check memory address when pseudo can't be created. > https://gcc.gnu.org/pipermail/gcc-patches/2024-August/660349.html > > I send the patch in order to call auto CI to test the patch. The > cfarm server is too slow to finish regression test overnight. > > I will check in the patch if there is no regressions and no one > objects it. > > Thanks > Gui Haochen > > ChangeLog > aarch64: Implement 16-byte vector mode const0 store by TImode > > gcc/ > * config/aarch64/aarch64-simd.md (mov for VSTRUCT_QD): > Expand 16-byte vector mode const0 store by TImode. > > patch.diff > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 01b084d8ccb..acf86e191c7 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -7766,7 +7766,16 @@ (define_expand "mov" > (match_operand:VSTRUCT_QD 1 "general_operand"))] >"TARGET_FLOAT" > { > - if (can_create_pseudo_p ()) > + if (known_eq (GET_MODE_SIZE (mode), 16) > + && operands[1] == CONST0_RTX (mode) > + && MEM_P (operands[0]) > + && (can_create_pseudo_p () > + || memory_address_p (TImode, XEXP (operands[0], 0 > +{ > + operands[0] = adjust_address (operands[0], TImode, 0); > + operands[1] = CONST0_RTX (TImode); > +} > + else if (can_create_pseudo_p ()) > { >if (GET_CODE (operands[0]) != REG) > operands[1] = force_reg (mode, operands[1]);
RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2
Turn out that the pre-commit doesn't pick up the newest upstream when testing this patch. Pan -Original Message- From: Li, Pan2 Sent: Monday, August 19, 2024 9:25 AM To: Jeff Law ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: RE: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 Opps, let me double check what happened to my local tester. Pan -Original Message- From: Jeff Law Sent: Sunday, August 18, 2024 11:21 PM To: Li, Pan2 ; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; rdapp@gmail.com Subject: Re: [PATCH v1 1/2] RISC-V: Add testcases for unsigned scalar quad and oct .SAT_TRUNC form 2 On 8/18/24 12:10 AM, pan2...@intel.com wrote: > From: Pan Li > > This patch would like to add test cases for the unsigned scalar quad and > oct .SAT_TRUNC form 2. Aka: > > Form 2: >#define DEF_SAT_U_TRUC_FMT_2(NT, WT) \ >NT __attribute__((noinline)) \ >sat_u_truc_##WT##_to_##NT##_fmt_2 (WT x) \ >{\ > WT max = (WT)(NT)-1; \ > return x > max ? (NT) max : (NT)x; \ >} > > QUAD: > DEF_SAT_U_TRUC_FMT_2 (uint16_t, uint64_t) > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint32_t) > > OCT: > DEF_SAT_U_TRUC_FMT_2 (uint8_t, uint64_t) > > The below test is passed for this patch. > * The rv64gcv regression test. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/sat_u_trunc-10.c: New test. > * gcc.target/riscv/sat_u_trunc-11.c: New test. > * gcc.target/riscv/sat_u_trunc-12.c: New test. > * gcc.target/riscv/sat_u_trunc-run-10.c: New test. > * gcc.target/riscv/sat_u_trunc-run-11.c: New test. > * gcc.target/riscv/sat_u_trunc-run-12.c: New test. Looks like they're failing in the upstream pre-commit tester: > https://github.com/ewlu/gcc-precommit-ci/issues/2066#issuecomment-2295137578 jeff
[PATCH v3] RISC-V: Support IMM for operand 0 of ussub pattern
From: Pan Li This patch would like to allow IMM for the operand 0 of ussub pattern. Aka .SAT_SUB(1023, y) as the below example. Form 1: #define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \ T __attribute__((noinline)) \ sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \ { \ return (T)IMM >= y ? (T)IMM - y : 0; \ } DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023) Before this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ bgtua0,a5,.L3 13 │ sub a0,a5,a0 14 │ ret 15 │ .L3: 16 │ li a0,0 17 │ ret After this patch: 10 │ sat_u_sub_imm82_uint64_t_fmt_1: 11 │ li a5,82 12 │ sltua4,a5,a0 13 │ addia4,a4,-1 14 │ sub a0,a5,a0 15 │ and a0,a4,a0 16 │ ret The below test suites are passed for this patch: 1. The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new func impl to gen xmode rtx reg from operand rtx. (riscv_expand_ussub): Gen xmode reg for operand 1. * config/riscv/riscv.md: Allow const_int for operand 1. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat_arith.h: Add test helper macro. * gcc.target/riscv/sat_u_sub_imm-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-1_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-2_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_1.c: New test. * gcc.target/riscv/sat_u_sub_imm-3_2.c: New test. * gcc.target/riscv/sat_u_sub_imm-4.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-1.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-2.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-3.c: New test. * gcc.target/riscv/sat_u_sub_imm-run-4.c: New test. Signed-off-by: Pan Li --- gcc/config/riscv/riscv.cc | 46 ++- gcc/config/riscv/riscv.md | 2 +- gcc/testsuite/gcc.target/riscv/sat_arith.h| 10 .../gcc.target/riscv/sat_u_sub_imm-1.c| 20 +++ .../gcc.target/riscv/sat_u_sub_imm-1_1.c | 20 +++ .../gcc.target/riscv/sat_u_sub_imm-1_2.c | 20 +++ .../gcc.target/riscv/sat_u_sub_imm-2.c| 21 +++ .../gcc.target/riscv/sat_u_sub_imm-2_1.c | 21 +++ .../gcc.target/riscv/sat_u_sub_imm-2_2.c | 22 .../gcc.target/riscv/sat_u_sub_imm-3.c| 20 +++ .../gcc.target/riscv/sat_u_sub_imm-3_1.c | 21 +++ .../gcc.target/riscv/sat_u_sub_imm-3_2.c | 22 .../gcc.target/riscv/sat_u_sub_imm-4.c| 19 +++ .../gcc.target/riscv/sat_u_sub_imm-run-1.c| 56 +++ .../gcc.target/riscv/sat_u_sub_imm-run-2.c| 56 +++ .../gcc.target/riscv/sat_u_sub_imm-run-3.c| 55 ++ .../gcc.target/riscv/sat_u_sub_imm-run-4.c| 48 17 files changed, 477 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-1_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-2_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-3_2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/sat_u_sub_imm-run-4.c diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index f266c45ed4d..5e6f3ba10e4 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -11893,6 +11893,50 @@ riscv_expand_usadd (rtx dest, rtx x, rtx y) emit_move_insn (dest, gen_lowpart (mode, xmode_dest)); } +/* Generate a REG rtx of Xmode from the given rtx and mode. + The rtx x can be REG (QI/HI/SI/DI) or const_int. + The machine_mode mode is the original mode from define pattern. + + If rtx is REG, the gen_lowpart of Xmode will be returned. + + If rtx is const_int, a new REG rtx will be created to hold the value of + const_int and then returned. + + According to the gccint doc, the constants generated for modes with fewer + bits than in HOST_WIDE_INT
[PATCH] RISC-V: Enable -gvariable-location-views by default
This affects only the RISC-V targets, where the compiler options -gvariable-location-views and consequently also -ginline-points are disabled by default, which is unexpected and disables some useful features of the generated debug info. Due to a bug in the gas assembler the .loc statement is not usable to generate location view debug info. That is detected by configure: configure:31500: checking assembler for dwarf2 debug_view support configure:31509: .../riscv-unknown-elf/bin/as-o conftest.o conftest.s >&5 conftest.s: Assembler messages: conftest.s:5: Error: .uleb128 only supports constant or subtract expressions conftest.s:6: Error: .uleb128 only supports constant or subtract expressions configure:31512: $? = 1 configure: failed program was .file 1 "conftest.s" .loc 1 3 0 view .LVU1 nop .data .uleb128 .LVU1 .uleb128 .LVU1 configure:31523: result: no This results in dwarf2out_as_locview_support being set to false, and that creates a sequence of events, with the end result that most inlined functions either have no DW_AT_entry_pc, or one with a wrong entry pc value. But the location views can also be generated without using any .loc statements, therefore we should enable the option -gvariable-location-views by default, regardless of the status of -gas-locview-support. --- Regression-tested on riscv-unknown-elf and riscv64-unknown-elf, OK for trunk? gcc/toplev.cc | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/gcc/toplev.cc b/gcc/toplev.cc index eee4805b504..292948122de 100644 --- a/gcc/toplev.cc +++ b/gcc/toplev.cc @@ -1475,9 +1475,7 @@ process_options () = (flag_var_tracking && debug_info_level >= DINFO_LEVEL_NORMAL && dwarf_debuginfo_p () - && !dwarf_strict - && dwarf2out_as_loc_support - && dwarf2out_as_locview_support); + && !dwarf_strict); } else if (debug_variable_location_views == -1 && dwarf_version != 5) { -- 2.39.2
[PING][PATCH][avr,v2] PR115830: Make better use of SREG
Ping for the patch to male better use of SREG and some code clean-ups for trunk, no new regressions. https://gcc.gnu.org/pipermail/gcc-patches/2024-August/659422.html Johann -- AVR: target/115830 - Make better use of SREG.N and SREG.Z. This patch adds new CC modes CCN and CCZN for operations that set SREG.N, resp. SREG.Z and SREG.N. Add peephole2 patterns to generate new compute + branch insns that make use of the Z and N flags. Most of these patterns need their own asm output routines that don't do all the micro-optimizations that the ordinary outputs may perform, as the latter have no requirement to set CC in a usable way. We don't use cmpelim because it cannot provide scratch regs (which peephole2 can), and some of the patterns require a scratch reg, whereas the same operations that don't set REG_CC don't require a scratch. See the comments in avr.md for details. The existing add.for.cc* patterns are simplified as they no more cover QImode, which is handled in a separate QImode case. Apart from that, it adds 3 patterns for subtractions and one pattern for shift left, all for multi-byte cases (HI, PSI, SI). The add.for.cc* patterns now use CC[Z]Nmode, instead of the formerly abuse of CCmode. PR target/115830 gcc/ * config/avr/avr-modes.def (CCN, CCZN): New CC_MODEs. * config/avr/avr-protos.h (avr_cond_branch): New from ret_cond_branch. (avr_out_plus_set_N, avr_op8_ZN_operator) (avr_out_op8_set_ZN, avr_len_op8_set_ZN): New protos. (ccn_reg_rtx, cczn_reg_rtx): New declarations. * config/avr/avr.cc (avr_cond_branch): New from ret_cond_branch. (avr_cond_string): Add bool cc_overflow_unusable argument. (avr_print_operand) ['L']: Like 'j' but overflow unusable. ['K']: Like 'k' but overflow unusable. (avr_out_plus_set_ZN): Remove handling of QImode. (avr_out_plus_set_N, avr_op8_ZN_operator) (avr_out_op8_set_ZN, avr_len_op8_set_ZN): New functions. (avr_adjust_insn_length) [ADJUST_LEN_ADD_SET_N]: Hande case. (avr_class_max_nregs): All MODE_CCs occupy one hard reg. (avr_hard_regno_nregs): Same. (avr_hard_regno_mode_ok) [REG_CC]: Allow all MODE_CC. (pass_manager.h): Include it. (ccn_reg_rtx, cczn_reg_rtx): New GTY variables. (avr_init_expanders): Initialize them. (avr_option_override): Run peephole2 a second time. * config/avr/avr.md (adjust_len) [add_set_N]: New attr value. (ALLCC, HI_SI): New mode iterators. (CCname): New mode attribute. (eqnegtle, cmp_signed, op8_ZN): New code iterators. (branch): Handle CCNmode and CCZNmode. Assimilate... (difficult_branch): ...this insn. (p1m1): Remove. (gen_add_for__): Adjust to CCNmode and CCZNmode. Use HISI as mode iterator. Extend peephole2s that produce them. (*add.for.eqne.): Extend to *add.for.cc[z]n.. (*ashift.for.ccn.): New insn and peephole2 to make them. (*sub.for.cczn., "*sub-extend.for.cczn.: New insns and peephole2s to make them. (*op8.for.cczn.): New insn and peephole2 to make them. * config/avr/predicates.md (const_1_to_3_operand) (abs1_abs2_operand, signed_comparison_operator) (op8_ZN_operator): New predicates. gcc/testsuite/ * gcc.target/avr/pr115830-add.c: New test. * gcc.target/avr/pr115830-add-c.c: New test. * gcc.target/avr/pr115830-add-i.c: New test. * gcc.target/avr/pr115830-and.c: New test. * gcc.target/avr/pr115830-asl.c: New test. * gcc.target/avr/pr115830-asr.c: New test. * gcc.target/avr/pr115830-ior.c: New test. * gcc.target/avr/pr115830-lsr.c: New test. * gcc.target/avr/pr115830-asl32.c: New test. * gcc.target/avr/pr115830-sub.c: New test. * gcc.target/avr/pr115830-sub-ext: New test.diff --git a/gcc/config/avr/avr-modes.def b/gcc/config/avr/avr-modes.def index e0633d680d5..b756beae9a0 100644 --- a/gcc/config/avr/avr-modes.def +++ b/gcc/config/avr/avr-modes.def @@ -18,6 +18,12 @@ FRACTIONAL_INT_MODE (PSI, 24, 3); +/* Used when the N (and Z) flag(s) of SREG are set. + The N flag indicates whether the value is negative. + The Z flag indicates whether the value is zero. */ +CC_MODE (CCN); +CC_MODE (CCZN); + /* Make TA and UTA 64 bits wide. 128 bit wide modes would be insane on a 8-bit machine. This needs special treatment in avr.cc and avr-lib.h. */ diff --git a/gcc/config/avr/avr-protos.h b/gcc/config/avr/avr-protos.h index 7b666f17718..fb7e0dc6a15 100644 --- a/gcc/config/avr/avr-protos.h +++ b/gcc/config/avr/avr-protos.h @@ -55,7 +55,7 @@ extern const char *avr_out_tsthi (rtx_insn *, rtx*, int*); extern const char *avr_out_tstpsi (rtx_insn *, rtx*, int*); extern const char *avr_out_compare (rtx_insn *, rtx*, int*); extern const char *avr_out_compare64 (rtx_insn *, rtx*, int*); -extern const char *ret_cond_branch (rtx