Re: [PATCH v2] c++, coroutines: Improve diagnostics for awaiter/promise.
> On 11 Jun 2025, at 17:51, Jason Merrill wrote: > > On 6/9/25 3:49 PM, Iain Sandoe wrote: >> Hi Jason, + error_at (loc, "%sawaitable type %qT is not a structure", + extra, o_type); >>> Generally identifiers should be incorporated with %qs, and relying on the >>> %s to provide a space doesn't seem very i8n-friendly. Better, I think, to >>> handle the case with no identifier as a separate diagnostic. >> Fixed. >>> It looks like there's no test for the initial_suspend case? >> Added one, retested on x86_64-darwin, powerpc64le linux, OK for trunk? >> thanks, >> Iain >> --- 8< --- >> At present, we can issue diagnostics about missing or malformed >> awaiter or promise methods when we encounter their uses in the >> body of a user's function. We might then re-issue the same >> diagnostics when processing the initial or final await expressions. >> This change avoids such duplication, and also attempts to >> identify issues with the initial or final expressions specifically >> since diagnostics for those do not have any useful line number. > > Rather than print just "initial_suspend" or "final_suspend", you might %D the > called promise member function? I took a look at this, it seems we’d have to fish the information out of the target expr (additionally there could be a co_await operator involved) I think we can improve the informational additions to the coroutines diagnostics in several places - perhaps take a pass through doing that once the functionality issues are sorted out. > Up to you, the patch is OK as is. I applied as is, that makes some progress, and expect to revisit more generally as above. thanks Iain > >> gcc/cp/ChangeLog: >> * coroutines.cc (build_co_await): Identify diagnostics >> for initial and final await expressions. >> (cp_coroutine_transform::wrap_original_function_body): Do >> not handle initial and final await expressions here ... >> (cp_coroutine_transform::apply_transforms): ... handle them >> here and avoid duplicate diagnostics. >> * coroutines.h: Declare inital and final await expressions >> in the transform class. >> gcc/testsuite/ChangeLog: >> * g++.dg/coroutines/coro1-missing-await-method.C: Adjust for >> improved diagnostics. >> * g++.dg/coroutines/pr104051.C: Move to... >> * g++.dg/coroutines/pr104051-0.C: ...here. >> * g++.dg/coroutines/pr104051-1.C: New test. >> Signed-off-by: Iain Sandoe >> --- >> gcc/cp/coroutines.cc | 24 +++ >> gcc/cp/coroutines.h | 3 +++ >> .../coroutines/coro1-missing-await-method.C | 4 ++-- >> .../coroutines/{pr104051.C => pr104051-0.C} | 2 +- >> gcc/testsuite/g++.dg/coroutines/pr104051-1.C | 23 ++ >> 5 files changed, 49 insertions(+), 7 deletions(-) >> rename gcc/testsuite/g++.dg/coroutines/{pr104051.C => pr104051-0.C} (92%) >> create mode 100644 gcc/testsuite/g++.dg/coroutines/pr104051-1.C >> diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc >> index d482f52fefa..18c0a4812c4 100644 >> --- a/gcc/cp/coroutines.cc >> +++ b/gcc/cp/coroutines.cc >> @@ -1277,8 +1277,14 @@ build_co_await (location_t loc, tree a, >> suspend_point_kind suspend_kind, >> if (TREE_CODE (o_type) != RECORD_TYPE) >> { >> - error_at (loc, "awaitable type %qT is not a structure", >> -o_type); >> + if (suspend_kind == FINAL_SUSPEND_POINT) >> +error_at (loc, "%qs awaitable type %qT is not a structure", >> + "final_suspend()", o_type); >> + else if (suspend_kind == INITIAL_SUSPEND_POINT) >> +error_at (loc, "%qs awaitable type %qT is not a structure", >> + "initial_suspend()", o_type); >> + else >> +error_at (loc, "awaitable type %qT is not a structure", o_type); >>return error_mark_node; >> } >> @@ -4329,7 +4335,6 @@ cp_coroutine_transform::wrap_original_function_body () >>/* Wrap the function body in a try {} catch (...) {} block, if exceptions >> are enabled. */ >>tree var_list = NULL_TREE; >> - tree initial_await = build_init_or_final_await (fn_start, false); >> /* [stmt.return.coroutine] / 3 >> If p.return_void() is a valid expression, flowing off the end of a >> @@ -4527,7 +4532,8 @@ cp_coroutine_transform::wrap_original_function_body () >>zero_resume = build2_loc (fn_end, MODIFY_EXPR, act_des_fn_ptr_type, >> resume_fn_ptr, zero_resume); >>finish_expr_stmt (zero_resume); >> - finish_expr_stmt (build_init_or_final_await (fn_end, true)); >> + finish_expr_stmt (final_await); >> + >>BIND_EXPR_BODY (update_body) = pop_stmt_list (BIND_EXPR_BODY >> (update_body)); >>BIND_EXPR_VARS (update_body) = nreverse (var_list); >>BLOCK_VARS (top_block) = BIND_EXPR_VARS (update_body); >> @@ -5266,6 +5272,16 @@ cp_coroutine_transform::apply_transforms () >> = coro_build_actor_or_destroy_function (orig_fn_decl, act_des_fn_
Re: [PATCH] [lra] force reg update after spilling to memory [PR120424]
This patch introduces an ICE in lra-eliminations.cc:1200 for an existing test case. In $builddir/gcc: $ make -k check-gcc RUNTESTFLAGS="--target_board=atmega128-sim avr-torture.exp=pr118591-1.c" FAIL: gcc.target/avr/torture/pr118591-1.c -O1 (internal compiler error: in update_reg_eliminate, at lra-eliminations.cc:1200) FAIL: gcc.target/avr/torture/pr118591-1.c -O2 (internal compiler error: in update_reg_eliminate, at lra-eliminations.cc:1200) ... Configured with: --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --enable-languages=c,c++ Please don't hesitate to ask me when you have problems reproducing the ICE. Notes on how to reproduce can also be found at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118591#c3 Configured with: --target=avr --disable-nls --with-dwarf2 --with-gnu-as --with-gnu-ld --enable-languages=c,c++ Johann -- Am 07.06.25 um 09:03 schrieb Denis Chertykov: Alexandre Oliva writes: On Jun 6, 2025, Alexandre Oliva wrote: Now, since lra_update_fp2sp_elimination checks that !elimination_fp2sp_occured_p, we *could* disable the fp2sp elimination, if it's selected, right away, so that it is not applied after we've disabled it, and then we don't have to worry about disabling it half-way or reversing it later. [lra] inactivate disabled fp2sp elimination Even after we disable the fp2sp elimination when it is the active elimination for the fp, spilling might use it before update_reg_eliminate runs and inactivates it for good. If it is used, update_reg_eliminate will fail the check that fp2sp was not used. Since we keep track of uses of this specific elimination, and lra_update_fp2sp_elimination checks it before disabling it, we know it hasn't been used, so we can inactivate it without any ill effects. This fixes the pr118591-1.c avr-none regression exposed by the PR120424 fix. Regstrapped on x86_64-linux-gnu. Also testing with gcc-14 on arm-vx7r2. Ok to install? for gcc/ChangeLog * lra-eliminations.cc (lra_update_fp2sp_elimination): Inactivate the unused fp2sp elimination right away. Thank you. The patch fixes PR118591. Denis --- gcc/lra-eliminations.cc | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/gcc/lra-eliminations.cc b/gcc/lra-eliminations.cc index bb708b007a4ee..6c8c91086f323 100644 --- a/gcc/lra-eliminations.cc +++ b/gcc/lra-eliminations.cc @@ -1415,6 +1415,14 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) if (frame_pointer_needed || !targetm.frame_pointer_required ()) return 0; gcc_assert (!elimination_fp2sp_occured_p); + ep = elimination_map[FRAME_POINTER_REGNUM]; + if (ep->to == STACK_POINTER_REGNUM) +{ + elimination_map[FRAME_POINTER_REGNUM] = NULL; + setup_can_eliminate (ep, false); +} + else +ep = NULL; if (lra_dump_file != NULL) fprintf (lra_dump_file, "Frame pointer can not be eliminated anymore\n"); @@ -1422,9 +1430,10 @@ lra_update_fp2sp_elimination (int *spilled_pseudos) CLEAR_HARD_REG_SET (set); add_to_hard_reg_set (&set, Pmode, HARD_FRAME_POINTER_REGNUM); n = spill_pseudos (set, spilled_pseudos); - for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) -if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM) - setup_can_eliminate (ep, false); + if (!ep) +for (ep = reg_eliminate; ep < ®_eliminate[NUM_ELIMINABLE_REGS]; ep++) + if (ep->from == FRAME_POINTER_REGNUM && ep->to == STACK_POINTER_REGNUM) + setup_can_eliminate (ep, false); return n; }
[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 0 with GR2VR cost 0, 2 and 15
From: Pan Li Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vmaxu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary.h: Add test helper macros. * gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test data for run test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u8.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u16.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u32.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u64.c: New test. * gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u8.c: New test. Signed-off-by: Pan Li --- .../riscv/rvv/autovec/vx_vf/vx-1-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx_binary.h | 10 + .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u16.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u32.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u64.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u8.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u16.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u32.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u64.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u8.c | 17 ++ 22 files changed, 378 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u32.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u64.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u32.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u64.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u8.c diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c index 474fed2be15..11848f8f8e1 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c @@ -13,6 +13,8 @@ DEF_VX_BINARY_CASE_0_WRAP(T, |, or) DEF_VX_BINARY_CASE_0_WRAP(T, ^, xor) DEF_VX_BINARY_CASE_0_WRAP(T, /, div) DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) +DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_0_WARP(T), max) +DEF_VX_BINARY_CASE_2_WRAP(T, MAX_FUNC_1_WARP(T), max) /* { dg-final { scan-assembler-times {vadd.vx} 1 } } */ /* { dg-final { scan-assembler-times {vsub.vx} 1 } } */ @@ -22,3 +24,4 @@ DEF_VX_BINARY_CASE_0_WRAP(T, %, rem) /* { dg-final { scan-assembler-times {vxor.vx} 1 } } */ /* { dg-final { scan-assembler-times {vdivu.vx} 1 } } */ /* { dg-final { scan-assembler-times {vremu.vx} 1 } } */ +/* { dg-final { scan-assembler-times {vmaxu.vx} 2 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c index 28c0524c993..b1e42ecb5dd 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c +++ b
[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost
From: Pan Li This patch would like to introduce the combine of vec_dup + vmaxu.vv into vmaxu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0: | ... | vmv.v.x | L1: | vmaxu.vv | J L1 | ... Case 1: | ... | L1: | vmv.v.x | vmaxu.vv | J L1 | ... Both will be combined to below if the cost of GR2VR is zero. | ... | L1: | vmaxu.vx | J L1 | ... The below test suites are passed for this patch series. * The rv64gcv fully regression test. Pan Li (3): RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 0 with GR2VR cost 0, 2 and 15 RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 1 with GR2VR cost 0, 1 and 2 gcc/config/riscv/riscv-v.cc | 2 + gcc/config/riscv/riscv.cc | 1 + gcc/config/riscv/vector-iterators.md | 4 +- .../riscv/rvv/autovec/vx_vf/vx-1-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-1-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-2-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-3-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-4-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-4-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-4-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-4-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-5-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-5-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-5-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-5-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx-6-u16.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-6-u32.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-6-u64.c| 3 + .../riscv/rvv/autovec/vx_vf/vx-6-u8.c | 3 + .../riscv/rvv/autovec/vx_vf/vx_binary.h | 10 + .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u16.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u32.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u64.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-1-u8.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u16.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u32.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u64.c | 17 ++ .../rvv/autovec/vx_vf/vx_vmax-run-2-u8.c | 17 ++ 37 files changed, 419 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u32.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u64.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u8.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u16.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u32.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u64.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u8.c -- 2.43.0
[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost
From: Pan Li This patch would like to combine the vec_duplicate + vmaxu.vv to the vmaxu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the GR2VR cost is greater than zero. Assume we have example code like below, GR2VR cost is 0. #define DEF_VX_BINARY(T, OP)\ void\ test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \ { \ for (unsigned i = 0; i < n; i++) \ out[i] = in[i] OP x;\ } DEF_VX_BINARY(int32_t, /) Before this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ vsetvli a5,zero,e32,m1,ta,ma 13 │ vmv.v.x v2,a2 14 │ sllia3,a3,32 15 │ srlia3,a3,32 16 │ .L3: 17 │ vsetvli a5,a3,e32,m1,ta,ma 18 │ vle32.v v1,0(a1) 19 │ sllia4,a5,2 20 │ sub a3,a3,a5 21 │ add a1,a1,a4 22 │ vmaxu.vv v1,v1,v2 23 │ vse32.v v1,0(a0) 24 │ add a0,a0,a4 25 │ bne a3,zero,.L3 After this patch: 10 │ test_vx_binary_or_int32_t_case_0: 11 │ beq a3,zero,.L8 12 │ sllia3,a3,32 13 │ srlia3,a3,32 14 │ .L3: 15 │ vsetvli a5,a3,e32,m1,ta,ma 16 │ vle32.v v1,0(a1) 17 │ sllia4,a5,2 18 │ sub a3,a3,a5 19 │ add a1,a1,a4 20 │ vmaxu.vx v1,v1,a2 21 │ vse32.v v1,0(a0) 22 │ add a0,a0,a4 23 │ bne a3,zero,.L3 gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new case UMAX. (expand_vx_binary_vec_vec_dup): Ditto. * config/riscv/riscv.cc (riscv_rtx_costs): Ditto. * config/riscv/vector-iterators.md: Add new op umax. Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 2 ++ gcc/config/riscv/riscv.cc| 1 + gcc/config/riscv/vector-iterators.md | 4 ++-- 3 files changed, 5 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index ef69991b431..011594966d3 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -5538,6 +5538,7 @@ expand_vx_binary_vec_dup_vec (rtx op_0, rtx op_1, rtx op_2, case XOR: case MULT: case SMAX: +case UMAX: icode = code_for_pred_scalar (code, mode); break; case MINUS: @@ -5573,6 +5574,7 @@ expand_vx_binary_vec_vec_dup (rtx op_0, rtx op_1, rtx op_2, case MOD: case UMOD: case SMAX: +case UMAX: icode = code_for_pred_scalar (code, mode); break; default: diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index f6608bd872b..74462cc76a5 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -3979,6 +3979,7 @@ riscv_rtx_costs (rtx x, machine_mode mode, int outer_code, int opno ATTRIBUTE_UN case XOR: case MULT: case SMAX: + case UMAX: { rtx op; rtx op_0 = XEXP (x, 0); diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index c9b836cc042..1e048c190a8 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -4042,11 +4042,11 @@ (define_code_iterator any_int_binop [plus minus and ior xor ashift ashiftrt lshi ]) (define_code_iterator any_int_binop_no_shift_v_vdup [ - plus minus and ior xor mult div udiv mod umod smax + plus minus and ior xor mult div udiv mod umod smax umax ]) (define_code_iterator any_int_binop_no_shift_vdup_v [ - plus minus and ior xor mult smax + plus minus and ior xor mult smax umax ]) (define_code_iterator any_int_unop [neg not]) -- 2.43.0
[PATCH, 4 of 4] Use vector pair for memory operations with -mcpu=future
This is patch #4 of 4 to add -mcpu=future support to the PowerPC. In the development for the power10 processor, GCC did not enable using the load vector pair and store vector pair instructions when optimizing things like memory copy. This patch enables using those instructions if -mcpu=future is used. I have tested these patches on both big endian and little endian PowerPC servers, with no regressions. Can I check these patchs into the trunk? 2025-06-13 Michael Meissner gcc/ * config/rs6000/rs6000-cpus.def (ISA_FUTURE_MASKS_SERVER): Enable using load vector pair and store vector pair instructions for memory copy operations. (POWERPC_MASKS): Make the bit for enabling using load vector pair and store vector pair operations set and reset when the PowerPC processor is changed. * gcc/config/rs6000/rs6000.cc (rs6000_machine_from_flags): Disable -mblock-ops-vector-pair from influcing .machine selection. gcc/testsuite/ * gcc.target/powerpc/future-3.c: New test. --- gcc/config/rs6000/rs6000-cpus.def | 4 +++- gcc/config/rs6000/rs6000.cc | 2 +- gcc/testsuite/gcc.target/powerpc/future-3.c | 22 + 3 files changed, 26 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/powerpc/future-3.c diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 228d0b5e7b5..063591f5c09 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -84,7 +84,8 @@ | OPTION_MASK_POWER11) #define FUTURE_MASKS_SERVER(POWER11_MASKS_SERVER \ -| OPTION_MASK_FUTURE) +| OPTION_MASK_FUTURE \ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR) /* Flags that need to be turned off if -mno-vsx. */ #define OTHER_VSX_VECTOR_MASKS (OPTION_MASK_EFFICIENT_UNALIGNED_VSX\ @@ -114,6 +115,7 @@ /* Mask of all options to set the default isa flags based on -mcpu=. */ #define POWERPC_MASKS (OPTION_MASK_ALTIVEC\ +| OPTION_MASK_BLOCK_OPS_VECTOR_PAIR\ | OPTION_MASK_CMPB \ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DFP \ diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 141d53b1a12..80fc500fcec 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -5907,7 +5907,7 @@ rs6000_machine_from_flags (void) /* Disable the flags that should never influence the .machine selection. */ flags &= ~(OPTION_MASK_PPC_GFXOPT | OPTION_MASK_PPC_GPOPT | OPTION_MASK_ISEL -| OPTION_MASK_ALTIVEC); +| OPTION_MASK_ALTIVEC | OPTION_MASK_BLOCK_OPS_VECTOR_PAIR); if ((flags & (FUTURE_MASKS_SERVER & ~ISA_3_1_MASKS_SERVER)) != 0) return "future"; diff --git a/gcc/testsuite/gcc.target/powerpc/future-3.c b/gcc/testsuite/gcc.target/powerpc/future-3.c new file mode 100644 index 000..afa8b96 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/future-3.c @@ -0,0 +1,22 @@ +/* 32-bit doesn't generate vector pair instructions. */ +/* { dg-do compile { target lp64 } } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +/* Test to see that memcpy will use load/store vector pair with + -mcpu=future. */ + +#ifndef SIZE +#define SIZE 4 +#endif + +extern vector double to[SIZE], from[SIZE]; + +void +copy (void) +{ + __builtin_memcpy (to, from, sizeof (to)); + return; +} + +/* { dg-final { scan-assembler {\mlxvpx?\M} } } */ +/* { dg-final { scan-assembler {\mstxvpx?\M} } } */ -- 2.49.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 1 with GR2VR cost 0, 1 and 2
From: Pan Li Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vmaxu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto. * gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto. Signed-off-by: Pan Li --- gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c | 3 +++ gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c | 3 +++ 12 files changed, 36 insertions(+) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c index 16ccaea251b..bee4171c0b4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c @@ -13,6 +13,8 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X16) DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X16) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_0_WARP(T), max, VX_BINARY_FUNC_BODY_X8) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, VX_BINARY_FUNC_BODY_X8) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +24,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X16) /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vdivu.vx} } } */ /* { dg-final { scan-assembler {vremu.vx} } } */ +/* { dg-final { scan-assembler {vmaxu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c index 0e2ab8d7838..376f1c63ff1 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c @@ -13,6 +13,8 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY_X4) DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X4) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_0_WARP(T), max, VX_BINARY_FUNC_BODY_X4) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, VX_BINARY_FUNC_BODY_X4) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +24,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY_X4) /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vdivu.vx} } } */ /* { dg-final { scan-assembler {vremu.vx} } } */ +/* { dg-final { scan-assembler {vmaxu.vx} } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c index 80eb8a4752e..034f50dfe63 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c @@ -13,6 +13,8 @@ DEF_VX_BINARY_CASE_1_WRAP(T, |, or, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, ^, xor, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, /, div, VX_BINARY_BODY) DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_0_WARP(T), max, VX_BINARY_FUNC_BODY) +DEF_VX_BINARY_CASE_3_WRAP(T, MAX_FUNC_1_WARP(T), max, VX_BINARY_FUNC_BODY) /* { dg-final { scan-assembler {vadd.vx} } } */ /* { dg-final { scan-assembler {vsub.vx} } } */ @@ -22,3 +24,4 @@ DEF_VX_BINARY_CASE_1_WRAP(T, %, rem, VX_BINARY_BODY) /* { dg-final { scan-assembler {vxor.vx} } } */ /* { dg-final { scan-assembler {vdivu.vx} } } */ /* { dg-final { scan-asse
Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost
LGTM 於 2025年6月14日 週六 22:38 寫道: > From: Pan Li > > This patch would like to introduce the combine of vec_dup + vmaxu.vv > into vmaxu.vx on the cost value of GR2VR. The late-combine will take > place if the cost of GR2VR is zero, or reject the combine if non-zero > like 1, 2, 15 in test. There will be two cases for the combine: > > Case 0: > | ... > | vmv.v.x > | L1: > | vmaxu.vv > | J L1 > | ... > > Case 1: > | ... > | L1: > | vmv.v.x > | vmaxu.vv > | J L1 > | ... > > Both will be combined to below if the cost of GR2VR is zero. > | ... > | L1: > | vmaxu.vx > | J L1 > | ... > > The below test suites are passed for this patch series. > * The rv64gcv fully regression test. > > Pan Li (3): > RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost > RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 0 with GR2VR > cost 0, 2 and 15 > RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 1 with GR2VR > cost 0, 1 and 2 > > gcc/config/riscv/riscv-v.cc | 2 + > gcc/config/riscv/riscv.cc | 1 + > gcc/config/riscv/vector-iterators.md | 4 +- > .../riscv/rvv/autovec/vx_vf/vx-1-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-1-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-1-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-1-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx-2-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-2-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-2-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-2-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx-3-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-3-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-3-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-3-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx-4-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-4-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-4-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-4-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx-5-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-5-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-5-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-5-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx-6-u16.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-6-u32.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-6-u64.c| 3 + > .../riscv/rvv/autovec/vx_vf/vx-6-u8.c | 3 + > .../riscv/rvv/autovec/vx_vf/vx_binary.h | 10 + > .../riscv/rvv/autovec/vx_vf/vx_binary_data.h | 196 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-1-u16.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-1-u32.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-1-u64.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-1-u8.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-2-u16.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-2-u32.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-2-u64.c | 17 ++ > .../rvv/autovec/vx_vf/vx_vmax-run-2-u8.c | 17 ++ > 37 files changed, 419 insertions(+), 2 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u16.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u32.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u64.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-1-u8.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u16.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u32.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u64.c > create mode 100644 > gcc/testsuite/gcc.target/riscv/rvv/autovec/vx_vf/vx_vmax-run-2-u8.c > > -- > 2.43.0 > >
[PATCH, 1 of 4] Add -mcpu=future support for PowerPC
This is patch #1 of 4 that adds the support that can be used in developing GCC support for future PowerPC processors. I have tested these patches on both big endian and little endian PowerPC servers, with no regressions. Can I check these patchs into the trunk? 2025-06-13 Michael Meissner * config.gcc (powerpc*-*-*): Add support for --with-cpu=future. * config/rs6000/aix71.h (ASM_CPU_SPEC): Add support for -mcpu=future. * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise. * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise. * config/rs6000/driver-rs6000.cc (asm_names): Likewise. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): If -mcpu=future, define _ARCH_FUTURE. * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro. (POWERPC_MASKS): Add OPTION_MASK_FUTURE. (future cpu): Define. * config/rs6000/rs6000-opts.h (enum processor_type): Add PROCESSOR_FUTURE. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (power10_cost): Update comment. (get_arch_flags): Add support for future processor. (rs6000_option_override_internal): Likewise. (rs6000_machine_from_flags): Likewise. (rs6000_reassociation_width): Likewise. (rs6000_adjust_cost): Likewise. (rs6000_issue_rate): Likewise. (rs6000_sched_reorder): Likewise. (rs6000_sched_reorder2): Likewise. (rs6000_register_move_cost): Likewise. (rs6000_opt_masks): Add -mfuture. * config/rs6000/rs6000.h (ASM_CPU_SPEC): Likewise. * config/rs6000/rs6000.md (cpu attribute): Likewise. * config/rs6000/rs6000.opt (-mfuture): New internal option. --- gcc/config.gcc | 4 ++-- gcc/config/rs6000/aix71.h | 1 + gcc/config/rs6000/aix72.h | 1 + gcc/config/rs6000/aix73.h | 1 + gcc/config/rs6000/driver-rs6000.cc | 2 ++ gcc/config/rs6000/rs6000-c.cc | 2 ++ gcc/config/rs6000/rs6000-cpus.def | 5 + gcc/config/rs6000/rs6000-opts.h | 1 + gcc/config/rs6000/rs6000-tables.opt | 11 +++ gcc/config/rs6000/rs6000.cc | 30 + gcc/config/rs6000/rs6000.h | 1 + gcc/config/rs6000/rs6000.md | 2 +- gcc/config/rs6000/rs6000.opt| 6 ++ 13 files changed, 52 insertions(+), 15 deletions(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 8365b917068..7674cafa8ea 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -533,7 +533,7 @@ powerpc*-*-*) extra_headers="${extra_headers} ppu_intrinsics.h spu2vmx.h vec_types.h si2vmx.h" extra_headers="${extra_headers} amo.h" case x$with_cpu in - xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500) + xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower1[01]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500|xfuture) cpu_is_64bit=yes ;; esac @@ -5694,7 +5694,7 @@ case "${target}" in tm_defines="${tm_defines} CONFIG_PPC405CR" eval "with_$which=405" ;; - "" | common | native \ + "" | common | native | future \ | power[3456789] | power1[01] | power5+ | power6x \ | powerpc | powerpc64 | powerpc64le \ | rs64 \ diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h index 2b21dd7cd1e..77651f5ea30 100644 --- a/gcc/config/rs6000/aix71.h +++ b/gcc/config/rs6000/aix71.h @@ -79,6 +79,7 @@ do { \ #undef ASM_CPU_SPEC #define ASM_CPU_SPEC \ "%{mcpu=native: %(asm_cpu_native); \ + mcpu=future: -mfuture; \ mcpu=power11: -mpwr11; \ mcpu=power10: -mpwr10; \ mcpu=power9: -mpwr9; \ diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h index 53c0bde5ad4..652f60c7f49 100644 --- a/gcc/config/rs6000/aix72.h +++ b/gcc/config/rs6000/aix72.h @@ -79,6 +79,7 @@ do { \ #undef ASM_CPU_SPEC #define ASM_CPU_SPEC \ "%{mcpu=native: %(asm_cpu_native); \ + mcpu=future: -mfuture; \ mcpu=power11: -mpwr11; \ mcpu=power10: -mpwr10; \ mcpu=power9: -mpwr9; \ diff --git a/gcc/config/rs6000/aix73.h b/gcc/config/rs6000/aix73.h index c7639368a26..3c66ac1d917 100644 --- a/gcc/config/rs6000/aix73.h +++ b/gcc/config/rs6000/aix73.h @@ -79,6 +79,7 @@ do { \ #undef ASM_CPU_SPEC #define ASM_CPU_SPEC \ "%{mcpu=native: %(asm_cpu_native); \ + mcpu=future: -mfuture; \ mcpu=power11: -mpwr11; \ mcpu=power10: -mpwr10; \ mcpu=power9: -mpwr9; \ diff --git a/gcc/config/rs6000/driver-rs6000.cc b/gcc/config/rs
[PATCH, 2 of 4] Add tuning support for -mcpu=future
This is patch #2 of 4 to add -mcpu=future support to the PowerPC. This patch makes -mtune=future use the same tuning decision as -mtune=power10 or -mtune=power11. I have tested these patches on both big endian and little endian PowerPC servers, with no regressions. Can I check these patchs into the trunk? 2025-06-13 Michael Meissner gcc/ * config/rs6000/power10.md (all reservations): Add future as an alterntive to power10 and power11. --- gcc/config/rs6000/power10.md | 145 ++- 1 file changed, 73 insertions(+), 72 deletions(-) diff --git a/gcc/config/rs6000/power10.md b/gcc/config/rs6000/power10.md index fd31b16b331..bdd7e58145b 100644 --- a/gcc/config/rs6000/power10.md +++ b/gcc/config/rs6000/power10.md @@ -1,4 +1,5 @@ -;; Scheduling description for the IBM Power10 and Power11 processors. +;; Scheduling description for the IBM Power10, Power11, and +;; potential future processors. ;; Copyright (C) 2020-2025 Free Software Foundation, Inc. ;; ;; Contributed by Pat Haugen (pthau...@us.ibm.com). @@ -97,12 +98,12 @@ (define_insn_reservation "power10-load" 4 (eq_attr "update" "no") (eq_attr "size" "!128") (eq_attr "prefixed" "no") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_any_power10,LU_power10") (define_insn_reservation "power10-fused-load" 4 (and (eq_attr "type" "fused_load_cmpi,fused_addis_load,fused_load_load") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-prefixed-load" 4 @@ -110,13 +111,13 @@ (define_insn_reservation "power10-prefixed-load" 4 (eq_attr "update" "no") (eq_attr "size" "!128") (eq_attr "prefixed" "yes") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-load-update" 4 (and (eq_attr "type" "load") (eq_attr "update" "yes") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10+SXU_power10") (define_insn_reservation "power10-fpload-double" 4 @@ -124,7 +125,7 @@ (define_insn_reservation "power10-fpload-double" 4 (eq_attr "update" "no") (eq_attr "size" "64") (eq_attr "prefixed" "no") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_any_power10,LU_power10") (define_insn_reservation "power10-prefixed-fpload-double" 4 @@ -132,14 +133,14 @@ (define_insn_reservation "power10-prefixed-fpload-double" 4 (eq_attr "update" "no") (eq_attr "size" "64") (eq_attr "prefixed" "yes") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-fpload-update-double" 4 (and (eq_attr "type" "fpload") (eq_attr "update" "yes") (eq_attr "size" "64") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10+SXU_power10") ; SFmode loads are cracked and have additional 3 cycles over DFmode @@ -148,27 +149,27 @@ (define_insn_reservation "power10-fpload-single" 7 (and (eq_attr "type" "fpload") (eq_attr "update" "no") (eq_attr "size" "32") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10") (define_insn_reservation "power10-fpload-update-single" 7 (and (eq_attr "type" "fpload") (eq_attr "update" "yes") (eq_attr "size" "32") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10+SXU_power10") (define_insn_reservation "power10-vecload" 4 (and (eq_attr "type" "vecload") (eq_attr "size" "!256") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_any_power10,LU_power10") ; lxvp (define_insn_reservation "power10-vecload-pair" 4 (and (eq_attr "type" "vecload") (eq_attr "size" "256") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,LU_power10+SXU_power10") ; Store Unit @@ -178,12 +179,12 @@ (define_insn_reservation "power10-store" 0 (eq_attr "prefixed" "no") (eq_attr "size" "!128") (eq_attr "size" "!256") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_any_power10,STU_power10") (define_insn_reservation "power10-fused-store" 0 (and (eq_attr "type" "fused_store_store") - (eq_attr "cpu" "power10,power11")) + (eq_attr "cpu" "power10,power11,future")) "DU_even_power10,STU_power10") (define_insn_reservation "power10-prefixed-store" 0 @@ -191,52 +192,52 @@
[PATCH, 3 of 4] Add -mcpu=future tests
This is patch #3 of 4 to add -mcpu=future support to the PowerPC. This patch adds simple tests for -mcpu=future. I have tested these patches on both big endian and little endian PowerPC servers, with no regressions. Can I check these patchs into the trunk? 2025-06-13 Michael Meissner gcc/testsuite/ * gcc.target/powerpc/future-1.c: New test. * gcc.target/powerpc/future-2.c: Likewise. --- gcc/testsuite/gcc.target/powerpc/future-1.c | 13 +++ gcc/testsuite/gcc.target/powerpc/future-2.c | 24 + 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/gcc.target/powerpc/future-1.c create mode 100644 gcc/testsuite/gcc.target/powerpc/future-2.c diff --git a/gcc/testsuite/gcc.target/powerpc/future-1.c b/gcc/testsuite/gcc.target/powerpc/future-1.c new file mode 100644 index 000..f1b940d7beb --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/future-1.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-mdejagnu-cpu=future -O2" } */ + +/* Basic check to see if the compiler supports -mcpu=future and if it defines + _ARCH_PWR11. */ + +#ifndef _ARCH_FUTURE +#error "-mcpu=future is not supported" +#endif + +void foo (void) +{ +} diff --git a/gcc/testsuite/gcc.target/powerpc/future-2.c b/gcc/testsuite/gcc.target/powerpc/future-2.c new file mode 100644 index 000..5552cefa3c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/future-2.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +/* Check if we can set the future target via a target attribute. */ + +__attribute__((__target__("cpu=power9"))) +void foo_p9 (void) +{ +} + +__attribute__((__target__("cpu=power10"))) +void foo_p10 (void) +{ +} + +__attribute__((__target__("cpu=power11"))) +void foo_p11 (void) +{ +} + +__attribute__((__target__("cpu=future"))) +void foo_future (void) +{ +} -- 2.49.0 -- Michael Meissner, IBM PO Box 98, Ayer, Massachusetts, USA, 01432 email: meiss...@linux.ibm.com
Re: [patch,avr,v15] PR120423, PR116389
Georg-Johann Lay writes: > This patch is to avoid PR120423 and PR116389 on avr. > > The PRs are about paradoxical subregs, that ICE after old reload > as follows: > > For rtxes like (subreg:HI (QI) 0), the inner rtx may be reloaded to > (reg:QI 31) which is fine, but the paradoxical subreg will be > changed to (reg:HI 31) which is invalid since all hard regs that > hold more than one byte must start in an even register. > > Reload calls targetm.hard_regno_mode_ok on the QI reg several > times and in several places, but in no case it calls that hook > on the HImode register. Moreover, in struct reload there is > no information about it's a subreg, so there is no simple fix in > reload. > > The patch passes without new regressions on avr + v15. It is not > activated when LRA is on (-mlra). The PR120423 test cases generate > the same code with the patch like with -mlra, so the patch doesn not > even introduce some performance penalty. > > The patch is only for v15, though it still works with trunk. > I have no plans to apply it on trunk, but when you like to see it > on trunk, too, that is no problem. The patch is of course only in > lack of a better patch. > > Ok for v15? Ok. Please apply. Denis.
[PATCH] c++, coroutines: Avoid UNKNOWN_LOCATION synthesizing code [PR120273].
Hi Jason, >>+ point to the closing brace. */ >>+ input_location = fn_end; >If we're going to have the loc variable at all, how about adjusting it here... Done. >> resume_fn_ptr, zero_resume); >...so you don't need to change these uses... >> finish_expr_stmt (zero_resume); >>- finish_expr_stmt (build_init_or_final_await (fn_start, true)); >>+ finish_expr_stmt (build_init_or_final_await (fn_end, true)); >...and use it here as well. Done here too - now adjusted to use 'loc' consistently. We still have to reset input_location, because of other code that might refer to it directly. OK for trunk now? thanks Iain --- 8< --- Some of the lookup code is expecting to find a valid (not UNKNOWN) location, which triggers in the reported case. To avoid this, we are reverting the change to use UNKNOWN_LOCATION for synthesizing the wrapper, and instead using the start and end locations of the original function. PR c++/120273 gcc/cp/ChangeLog: * coroutines.cc (cp_coroutine_transform::wrap_original_function_body): Use function start and end locations when synthesizing code. (cp_coroutine_transform::cp_coroutine_transform): Set the function end location. * coroutines.h: Add the function end location. gcc/testsuite/ChangeLog: * g++.dg/coroutines/coro-missing-final-suspend.C: Adjust for changed final suspend diagnostics line number change. * g++.dg/coroutines/coro1-missing-await-method.C: Likewise. * g++.dg/coroutines/pr104051.C: Likewise. * g++.dg/coroutines/pr120273.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 22 --- gcc/cp/coroutines.h | 1 + .../coroutines/coro-missing-final-suspend.C | 4 +- .../coroutines/coro1-missing-await-method.C | 2 +- gcc/testsuite/g++.dg/coroutines/pr104051.C| 4 +- gcc/testsuite/g++.dg/coroutines/pr120273.C| 58 +++ 6 files changed, 77 insertions(+), 14 deletions(-) create mode 100644 gcc/testsuite/g++.dg/coroutines/pr120273.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index 1fbdee1b4f6..6518e9202d0 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -4307,8 +4307,7 @@ cp_coroutine_transform::wrap_original_function_body () { /* Avoid the code here attaching a location that makes the debugger jump. */ iloc_sentinel stable_input_loc (fn_start); - location_t loc = UNKNOWN_LOCATION; - input_location = loc; + location_t loc = fn_start; /* This will be our new outer scope. */ tree update_body @@ -4450,7 +4449,7 @@ cp_coroutine_transform::wrap_original_function_body () /* If the coroutine has a frame that needs to be freed, this will be set by the ramp. */ - var = coro_build_artificial_var (fn_start, coro_frame_needs_free_id, + var = coro_build_artificial_var (loc, coro_frame_needs_free_id, boolean_type_node, orig_fn_decl, NULL_TREE); DECL_CHAIN (var) = var_list; var_list = var; @@ -4462,7 +4461,7 @@ cp_coroutine_transform::wrap_original_function_body () tree ueh = coro_build_promise_expression (orig_fn_decl, promise, coro_unhandled_exception_identifier, -fn_start, NULL, /*musthave=*/true); +loc, NULL, /*musthave=*/true); /* Create and initialize the initial-await-resume-called variable per [dcl.fct.def.coroutine] / 5.3. */ tree i_a_r_c @@ -4524,9 +4523,9 @@ cp_coroutine_transform::wrap_original_function_body () tree ueh_meth = lookup_promise_method (orig_fn_decl, coro_unhandled_exception_identifier, -fn_start, /*musthave=*/false); +loc, /*musthave=*/false); if (!ueh_meth || ueh_meth == error_mark_node) - warning_at (fn_start, 0, "no member named %qE in %qT", + warning_at (loc, 0, "no member named %qE in %qT", coro_unhandled_exception_identifier, get_coroutine_promise_type (orig_fn_decl)); } @@ -4539,6 +4538,10 @@ cp_coroutine_transform::wrap_original_function_body () add_stmt (return_void); } + /* We are now doing actions associated with the end of the function, so + point to the closing brace. */ + input_location = loc = fn_end; + /* co_return branches to the final_suspend label, so declare that now. */ fs_label = create_named_label_with_ctx (loc, "final.suspend", NULL_TREE); @@ -4550,7 +4553,7 @@ cp_coroutine_transform::wrap_original_function_body () zero_resume = build2_loc (loc, MODIFY_EXPR, act_des_fn_ptr_type, resume_fn_ptr, zero_resume); finish_expr_stmt (zero_resume); - finish_expr_stmt (build_init_
[PATCH] c++, coroutines: Handle unevaluated contexts.
Hi Jason, >>It seems that we had not been marking typeid expressions as unevaluated >>so that is also added here. >This seems to be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68604 >But only some typeid expressions are unevaluated; the >https://eel.is/c++draft/expr#typeid-4 case (polymorphic glvalue) is not. > >In any case let's not mess with it in this patch. Removed, added a note and xfailed the test for it, so that once we get a resolution to PR68604 it should just start XPASSing and we're done. OK for trunk? thanks Iain --- 8< --- >From [expr.await]/2 We should not accept co_await, co_yield in unevaluated contexts. Currently (see PR68604) we do not mark typeid expressions as unevaluated since the standard rules mean that this depends on the value type. gcc/cp/ChangeLog: * coroutines.cc (finish_co_await_expr): Do not allow in an unevaluated context. (finish_co_yield_expr): Likewise. gcc/testsuite/ChangeLog: * g++.dg/coroutines/unevaluated.C: New test. Signed-off-by: Iain Sandoe --- gcc/cp/coroutines.cc | 12 + gcc/testsuite/g++.dg/coroutines/unevaluated.C | 25 +++ 2 files changed, 37 insertions(+) create mode 100644 gcc/testsuite/g++.dg/coroutines/unevaluated.C diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc index a1986282ca7..6b3fe540376 100644 --- a/gcc/cp/coroutines.cc +++ b/gcc/cp/coroutines.cc @@ -1549,6 +1549,12 @@ finish_co_await_expr (location_t kw, tree expr) if (!expr || error_operand_p (expr)) return error_mark_node; + if (cp_unevaluated_operand) +{ + error_at (kw, "%qs cannot be used in an unevaluated context","co_await"); + return error_mark_node; +} + if (!coro_common_keyword_context_valid_p (current_function_decl, kw, "co_await")) return error_mark_node; @@ -1629,6 +1635,12 @@ finish_co_yield_expr (location_t kw, tree expr) if (!expr || error_operand_p (expr)) return error_mark_node; + if (cp_unevaluated_operand) +{ + error_at (kw, "%qs cannot be used in an unevaluated context","co_yield"); + return error_mark_node; +} + /* Check the general requirements and simple syntax errors. */ if (!coro_common_keyword_context_valid_p (current_function_decl, kw, "co_yield")) diff --git a/gcc/testsuite/g++.dg/coroutines/unevaluated.C b/gcc/testsuite/g++.dg/coroutines/unevaluated.C new file mode 100644 index 000..63dae38dea3 --- /dev/null +++ b/gcc/testsuite/g++.dg/coroutines/unevaluated.C @@ -0,0 +1,25 @@ +// { dg-additional-options "-fsyntax-only" } +#include +#include + +struct Task { +struct promise_type { +promise_type() = default; +Task get_return_object() { return {}; } +std::suspend_never initial_suspend() { return {}; } +std::suspend_always final_suspend() noexcept { return {}; } +void unhandled_exception() {} +void return_void () {} +std::suspend_never yield_value (int) { return {}; } +}; +}; + +// We do not permit co_await, co_yield outside a function, and so uses in +// noexcept or requirements are covered by that. +Task foo() { +/* This one will currently fail - see PR68604. */ +const std::type_info& ti1 = typeid (co_await std::suspend_never{}); // { dg-error {'co_await' cannot be used in an unevaluated context} "" { xfail *-*-* } } +std::size_t x = sizeof (co_yield (19)); // { dg-error {'co_yield' cannot be used in an unevaluated context} } +decltype (co_await std::suspend_never{}) A; // { dg-error {'co_await' cannot be used in an unevaluated context} } +co_return; +} -- 2.39.2 (Apple Git-143)
Re: [gcc-wwwdocs PATCH] gcc-15: Correct DMR ISA base platform to include AMX-COMPLEX
On Fri, 13 Jun 2025, Haochen Jiang wrote: > I just found that since AMX-COMPLEX is enabled on Diamond Rapids but > not enabled on Granite Rapids, we should use the ISA level from > Granite Rapids D instead of Granite Rapids to show that. > > Since Diamond Rapids is the actual successor of Granite Rapids but > not Granite Rapids D, I slightly tweak the word here just like what > we did in GCC13 for Sierra Forest. I trust you have the details of your extensions down. :) Okay. Gerald