Re: [PATCH v1 0/4] RISC-V: Combine vec_duplicate + vdiv.vv to vdiv.vx on GR2VR cost

2025-06-03 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vdiv.vv into vdiv.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: The series is OK, thanks.

Re: [PATCH v2 1/2] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-06-03 Thread Robin Dapp
This series is OK now, thanks. -- Regards Robin

Re: [PATCH] RISC-V: Support CPUs in -march.

2025-06-03 Thread Robin Dapp
1. riscv64-linux-gcc -march=rv64gc -march=foo-cpu -mtune=foo-cpu 2. riscv64-linux-gcc -march=rv64gc -march=foo-cpu 3. riscv64-linux-gcc -march=rv64gc -march=unset -mtune=unset -mcpu=foo-cpu Preference to me: - Prefer option 1. - Less prefer option 3. (acceptable but I don't like) - Strongly disli

Re: [PATCH] RISC-V: Support CPUs in -march.

2025-06-02 Thread Robin Dapp
I don't quite follow this part. IIUC the rules before this patch were -march=ISA: Generate code that requires the given ISA, without changing the tuning model. -mcpu=CPU: Generate code for the given CPU, targeting all the extensions that CPU supports and using the best known tu

Re: [PATCH] RISC-V: Support CPUs in -march.

2025-06-01 Thread Robin Dapp
This rule clearly applies to directly related options like -ffoo and -fno-foo, but it’s less obvious for unrelated pairs like -ffoo and -fbar especially when there is traditionally strong specifics. In many cases, the principle of "the most specific option wins" governs the behavior. Here

Re: [PATCH] RISC-V: Support CPUs in -march.

2025-06-01 Thread Robin Dapp
I stumped across this change from https://github.com/riscv-non-isa/riscv-toolchain-conventions/issues/88 and I want to express my strong disagreement with this change. Perhaps I'm accustomed to Arm's behavior, but I believe using -march= to target a specific CPU isn't ideal. * -march=X: (exe

Re: [PATCH v1] RISC-V: Fix line too long format issue for autovect.md [NFC]

2025-05-31 Thread Robin Dapp
Inspired by the avg_ceil patches, notice there were even more lines too long from autovec.md. So fix that format issues. OK. -- Regards Robin

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-05-30 Thread Robin Dapp
Hi Paul-Antoine, overall the patch looks reasonable to me now, provided the fr2vr followup. BTW it's the late-combine pass that performs the optimization, not the combine pass. You might still want to fix this in the commit message. Please CC patchworks...@rivosinc.com for the next version

Re: [PATCH v1 0/3] Refine the avg_ceil with fixed point vaadd

2025-05-30 Thread Robin Dapp
Looks like the CI cannot tell patch series? There are 3 patches and the CI will run for each one. Of course, the first one will have scan failure due to expanding change, but the second one reconciles them. Finally the third one will have all test passed as below, I think it indicates all test

Re: [PATCH v1 0/3] Refine the avg_ceil with fixed point vaadd

2025-05-30 Thread Robin Dapp
Similar to the avg_floor, the avg_ceil has the rounding mode towards +inf, while the vaadd.vv has the rnu which totally match the sematics. From RVV spec, the fixed vaadd.vv with rnu, The CI shows some scan failures in vls/avg-[456].c and widen/vec-avg-rv32gcv.c. Also, the lint check complains

Re: [PATCH] testsuite: RISC-V: Fix the typo in param-autovec-mode.c

2025-05-28 Thread Robin Dapp
This patch fixes the typo in the test case `param-autovec-mode.c` in the RISC-V autovec testsuite. The option `autovec-mode` is changed to `riscv-autovec-mode` to match the expected parameter name. OK of course :) -- Regards Robin

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmul.vv to vmul.vx on GR2VR cost

2025-05-28 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vmul.vv into vmul.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: OK. -- Regards Robin

Re: [PATCH v2 0/3] Refine the avg_floor with fixed point vaadd

2025-05-28 Thread Robin Dapp
LGTM, thanks. -- Regards Robin

[PATCH v2 0/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-27 Thread Robin Dapp
The first patch makes SLP paths unreachable and the second one removes those entirely. The third patch does the actual strided-load work. Bootstrapped and regtested on x86 and aarch64. Regtested on rv64gcv_zvl512b. Robin Dapp (3): vect: Make non-SLP paths unreachable in strided slp

[PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-27 Thread Robin Dapp
From: Robin Dapp This patch enables strided loads for VMAT_STRIDED_SLP. Instead of building vectors from scalars or other vectors we can use strided loads directly when applicable. The current implementation limits strided loads to cases where we can load entire groups and not subsets of them

[PATCH v2 2/3] vect: Remove non-SLP paths in strided slp/elementwise.

2025-05-27 Thread Robin Dapp
This removes the non-SLP paths that were made unreachable in the previous patch. gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Remove non-SLP paths. --- gcc/tree-vect-stmts.cc | 49 -- 1 file changed, 18 insertions(+), 31 deletions(-) d

[PATCH v2 1/3] vect: Make non-SLP paths unreachable in strided slp/elementwise.

2025-05-27 Thread Robin Dapp
From: Robin Dapp This replaces if (slp) with if (1) and if (!slp) with if (0). gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Make non-SLP paths unreachable. --- gcc/tree-vect-stmts.cc | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc

Re: [PATCH 2/2] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-27 Thread Robin Dapp
This mangles in the non-SLP path removal, can you please separate that out? So should patch 1/2 do more than it does, i.e. fully remove the non-slp paths rather than just if (0) them? -- Regards Robin

Re: [PATCH 2/2] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-27 Thread Robin Dapp
That would be appreciated (but is of course a larger task - I was fine with the partial thing you did). Ok. Then to move things forward I'll do a 2/3 for this one first. Once we're through the review cycle for the series I can work on the non-slp removal for the full function. -- Regards R

Re: [PATCH 2/2] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-27 Thread Robin Dapp
On Tue, May 27, 2025 at 2:44 PM Robin Dapp wrote: > This mangles in the non-SLP path removal, can you please separate that > out? So should patch 1/2 do more than it does, i.e. fully remove the non-slp paths rather than just if (0) them? There should be a separate 2/3 that does thi

[PATCH] RISC-V: Avoid division by zero in check_builtin_call [PR120436].

2025-05-27 Thread Robin Dapp
Hi, in check_builtin_call we eventually perform a division by zero when no vector modes are present. This patch just avoids the division in that case. Regtested on rv64gcv_zvl512b. I guess this is obvious enough that it can be pushed after the CI approves. Regards Robin PR target/1

Re: [PATCH v1 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_floor

2025-05-26 Thread Robin Dapp
-(define_expand "avg3_floor" - [(set (match_operand: 0 "register_operand") - (truncate: -(ashiftrt:VWEXTI - (plus:VWEXTI - (sign_extend:VWEXTI - (match_operand: 1 "register_operand")) - (sign_extend:VWEXTI - (match_operand: 2 "register_operand"))] +(define_expan

Re: simple frm save/restore strategy (was Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn)

2025-05-26 Thread Robin Dapp
2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant read/writes is punted to later passes which obviously needs more work. 3. NOK: We loose the ability to instrument local RM writes - especially in the testsuite.   e.g.      a.  instrinsic setting a static RM b. get_frm

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vxor.vv to vxor.vx on GR2VR cost

2025-05-26 Thread Robin Dapp
OK, thanks. -- Regards Robin

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR cost

2025-05-23 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vor.vv into vor.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: OK, thanks. -- Regards Robin

Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-22 Thread Robin Dapp
AFAICT the main difference to standard mode switching is that we (ab)use it to set the rounding mode to the value it had initially, either at function entry or after a call.  That's different to regular mode switching which assumes "static" rounding modes for different instructions. Standard c

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-05-22 Thread Robin Dapp
Hi Paul-Antoine, Please find attached a revised version of the patch. Compared to the previous iteration, I have: * Rebased on top of Pan's work; * Updated the cost model; * Added a second pattern to handle the case where PLUS_MINUS operands are swapped; * Added compile and run tests. I boot

Re: [PATCH] RISC-V: Add autovec mode param.

2025-05-21 Thread Robin Dapp
Could you make a simple testcase that could vectorize two loops in different modes (e.g one SI and one SF) and with this param will only auto vec on loop? I added a test now in the attached v2 that checks that we vectorize with the requested mode. Right now the patch only takes away "additiona

Re: [PATCH] RISC-V: Support CPUs in -march.

2025-05-21 Thread Robin Dapp
I could imagine that is a simpler way to set the march since the march string becomes terribly long - we have an arch string more than 300 char...so I support this, although I think this should be discuss with LLVM community, but I think it's fine to accept as a GCC extension. So LGTM, go ahead t

[PATCH] RISC-V: Support CPUs in -march.

2025-05-21 Thread Robin Dapp
Hi, This patch allows an -march string like -march=sifive-p670 in order to allow overriding a previous -march in a simple way. Suppose we have a Makefile that specifies -march=rv64gc by default. A user-specified -mcpu=sifive-p670 would be after the -march in the options string and thus only s

[PATCH] RISC-V: Default-initialize variable.

2025-05-21 Thread Robin Dapp
Hi, this patch initializes saved_vxrm_mode to VXRM_MODE_NONE. This is a warning (but no error) when building the compiler so better fix it. Regtested on rv64gcv_zvl512b. Going to commit as obvious if the CI is happy. Regards Robin gcc/ChangeLog: * config/riscv/riscv.cc (singleton_vx

[PATCH] RISC-V: Add autovec mode param.

2025-05-21 Thread Robin Dapp
Hi, This patch adds a --param=autovec-mode=. When the param is specified we make autovectorize_vector_modes return exactly this mode if it is available. This helps when testing different vectorizer settings. Regtested on rv64gcv_zvl512b. Regards Robin gcc/ChangeLog: * config/riscv/r

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vand.vv to vand.vx on GR2VR cost

2025-05-21 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vand.vv into vand.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: OK, thanks. -- Regards Rob

Re: [PATCH 3/6] RISC-V: frm/mode-switch: remove dubious frm edge insertion before call_insn

2025-05-20 Thread Robin Dapp
Maybe I'm missing something there. Particularly whether or not you can know anything about frm's value after a call has returned. Normally the answer to this kind of question is a hard no. AFAICT the main difference to standard mode switching is that we (ab)use it to set the rounding mode to

[PATCH 2/2] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-20 Thread Robin Dapp
This patch enables strided loads for VMAT_STRIDED_SLP. Instead of building vectors from scalars or other vectors we can use strided loads directly when applicable. The current implementation limits strided loads to cases where we can load entire groups and not subsets of them. A future improveme

[PATCH 0/2] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-05-20 Thread Robin Dapp
The second patch adds strided-load support for strided-slp memory access. The first patch makes the respective non-slp paths unreachable. Robin Dapp (2): vect: Remove non-SLP paths in strided slp and elementwise. vect: Use strided loads for VMAT_STRIDED_SLP. gcc/internal-fn.cc

[PATCH 1/2] vect: Remove non-SLP paths in strided slp and elementwise.

2025-05-20 Thread Robin Dapp
This replaces if (slp) with if (1) and if (!slp) with if (0). gcc/ChangeLog: * tree-vect-stmts.cc (vectorizable_load): Make non-slp paths unreachable. --- gcc/tree-vect-stmts.cc | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/gcc/tree-vect-stmts

Re: [PATCH v1 0/8] RISC-V: Combine vec_duplicate + vrsub.vv to vrsub.vx on GR2VR cost

2025-05-19 Thread Robin Dapp
The series LGTM. I didn't check all the tests in detail to be honest :) -- Regards Robin

Re: [PATCH][RFC] Allow the target to request a masked vector epilogue

2025-05-16 Thread Robin Dapp
I was thinking of adding a vectorization_mode class that would encapsulate the mode and whether to allow masking or alternatively to make the vector_modes array (and the m_suggested_epilogue_mode) a std::pair of mode and mask flag? Without having a very strong opinion (or the full background) on

Re: [PATCH v1 00/10] RISC-V: Combine vec_duplicate + vsub.vv to vsub.vx on GR2VR cost

2025-05-16 Thread Robin Dapp
Excuse the delay, I was attending the RISC-V Summit Europe. The series LGTM. -- Regards Robin

Re: [PATCH v1 0/7] RISC-V: Combine vec_duplicate + vsub.vv to vsub.vx on GR2VR cost

2025-05-12 Thread Robin Dapp
I think we need the run tests for each op combine up to a point. But for asm check, Seems we can put it together? I mean something like below: +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d --param=gpr2vr-cost=0" } */ + +#include "vx_binary.h" + +DEF_VX_BINARY_CASE_0(int3

Re: [PATCH v1 0/7] RISC-V: Combine vec_duplicate + vsub.vv to vsub.vx on GR2VR cost

2025-05-12 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vsub.vv into vsub.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: The changes to add are very

Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Robin Dapp
it's just a vector cost model issue and some loops are not profitable to vectorize? Yes. For example, when gpr2vr is 1, int8_t cannot vectorize while uint8_t can. OK, understood. I think that's expected given the fine granularity of the tests. IMHO nothing that should block progress. -- R

Re: [PATCH v1 0/5] Add testcases for another case of vec_duplicate + vadd.vv combine

2025-05-08 Thread Robin Dapp
This patch series would like to add the testcases for this. However, some test results is not that tidy, and we need more tuning for the vector cost model. The test adjustments LGTM but what do you mean by not tidy? I see you're scanning just for the presence of "vx" instead of an exact numbe

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-05-07 Thread Robin Dapp
Thanks Jeff. I will rebase and update my patch. One question though, I noticed that Pan's patch introduced a command-line parameter to tweak the GR2VR cost; do we need something equivalent for FR2VR? Yes, we need it in order to be able to test both paths, i.e. combining and not combining. Als

Re: [PATCH v4 2/6] RISC-V: Add gr2vr cost helper function

2025-05-06 Thread Robin Dapp
+/* + * Return the cost of operation that move from gpr to vr. + * + * It will take the value of --param=gpr2vr_cost if it is provided. + * Or the default regmove->GR2VR will be returned. + */ Please still remove the leading '*' of the comment. The series is OK with that fixed. Thanks for you

Re: [PATCH v1 1/5] RISC-V: Add new option --param=rvv-gr2vr-cost= for rvv insn

2025-05-05 Thread Robin Dapp
1. those static const var initialized before options, can hardly initialize correct. 2. The --param is somehow experimental, thus I prefer to keep the const GR2VR in static structure as is. I will append a new patch,aka let the reference goes to the new helper if that is OK. Yes that should

Re: [PATCH v1 1/5] RISC-V: Add new option --param=rvv-gr2vr-cost= for rvv insn

2025-05-05 Thread Robin Dapp
Hi Pan, During investigate the combine from vec_dup and vop.vv into vop.vx, we need to depend on the cost of the insn operate from the gr to vr, for example, vadd.vx. Thus, for better control and test, we introduce a new option, aka below: --param=rvv-gr2vr-cost= +static inline int +get_vec

Re: [PATCH] RISC-V: Allow different dynamic floating point mode to be merged [PR119832]

2025-04-29 Thread Robin Dapp
Although we already try to set the mode needed to FRM_DYN after a function call, there are still some corner cases where both FRM_DYN and FRM_DYN_CALL may appear on incoming edges. Therefore, we use TARGET_MODE_CONFLUENCE to tell GCC that FRM_DYN, FRM_DYN_CALL, and FRM_DYN_EXIT modes are compatib

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-29 Thread Robin Dapp
I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I have a try testing. For example with below change: + switch (rcode) + { + case VEC_DUPLICATE: + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); + break; +

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-28 Thread Robin Dapp
But this is not that good enough here if my understanding is correct. As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR, But it should. Can't we do something like: if (riscv_v_ext_mode_p (mode)) { switch (GET_CODE (x)) { case VEC_DUPLICATE:

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-28 Thread Robin Dapp
Make sense to me, it looks like the combine will always take place if GR2VR is 0, 1 or 2 for now. I am try to customize the cost here to make it fail to combine but get failed with below change. + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (GET_MODE (XEXP (x, 0 { +cost_val = 1; +

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-24 Thread Robin Dapp
Ah, I see, thanks. So vec_dup costs 1 + 2 and vadd.vv costs 1 totalling 4 while vadd.vx costs 1 + 2, making it cheaper? Yes, looks we need to just assign the GR2VR when vec_dup. I also tried diff cost here to see the impact to late-combine. + if (rcode == VEC_DUPLICATE && SCALAR_INT_MODE_P (

Re: [PATCH v1 0/4] Refactor long function expand_const_vector

2025-04-23 Thread Robin Dapp
These patches LGTM from myside. But please wait for other folks to comment. The series LGTM as well. But please wait with merging until GCC 15.1 is released (as requested by the release maintainers). -- Regards Robin

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-23 Thread Robin Dapp
The only thing I think we want for the patch (as Pan also raised last time) is the param to set those .vx costs to zero in order to ensure the tests test the right thing (--param=vx_preferred/gr2vr_cost or something). I see, shall we start a new series for this? AFAIK, we may need some more al

Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost

2025-04-22 Thread Robin Dapp
/* TODO: We set RVV instruction cost as 1 by default. Cost Model need to be well analyzed and supported in the future. */ + int cost_val = 1; + enum rtx_code rcode = GET_CODE (x); + + /* Aka (vec_duplicate:RVVM1DI (reg/v:DI 143 [ x ])) */ + if (rcode == VEC_DUPLICATE && SCALAR_INT_MO

Re: [PATCH 3/3][GCC16-Stage-1] RISC-V: Add testcases for vec_duplicate + vadd.vv combine to vadd.vx

2025-04-17 Thread Robin Dapp
Hi Pan, I am not sure if we have some options additional to below, like -march=generic, to ensure that the late-combine will take action as expected in testcases. +/* { dg-options "-march=rv64gcv -mabi=lp64d" } */ I haven't gone through the rest yet (will take some more days) but yes, I agr

Re: [PATCH v2] RISC-V: vsetvl: elide abnormal edges from LCM computations [PR119533]

2025-04-15 Thread Robin Dapp
The solution is to filter out abnormal edges from getting into LCM at all. Existing invalid_opt_bb_p () has such checks for BB predecessors but not for successors which is what the patch adds. OK. -- Regards Robin

[PATCH] expr: Use constant_lower_bound classifying constructor els [PR116595].

2025-04-10 Thread Robin Dapp
Hi, in categorize_ctor_elements_1 we do VECTOR_CST_NELTS (value).to_constant () but VALUE's type can be a VLA vector (since r15-5780-g17b520a10cdaab). This patch uses constant_lower_bound instead. Bootstrapped and regtested on x86, aarch64, and power 10. Regtested on rv64gcv_zvl512b. Regards

Re: [PATCH] riscv: Fix r15-9270 fallout on RISC-V

2025-04-10 Thread Robin Dapp
Tested with compilation of x86_64-linux -> riscv64-linux cross, ok for trunk? Yes. -- Regards Robin

Re: [PATCH v1] RISC-V: Refine the testcases for cond_widen_complicate-3

2025-04-09 Thread Robin Dapp
I see, reverted. Thanks Robin for reminder. Thanks! BTW and just for open discussion, is this really a good way for such kind of tests? Though most of the tests are similar like this but it may hide possible unexpected results up to a point. Yeah we have several flaky tests and in those cas

[PATCH v2] RISC-V: Do not lift up vsetvl into non-transparent blocks [PR119547].

2025-04-09 Thread Robin Dapp
Hi, when lifting up a vsetvl into a block we currently don't consider the block's transparency with respect to the vsetvl as in other parts of the pass. This patch does not perform the lift when transparency is not guaranteed. This condition is more restrictive than necessary as we can still pe

Re: [PATCH v1] RISC-V: Refine the testcases for cond_widen_complicate-3

2025-04-09 Thread Robin Dapp
Hi Pan, Richard committed combine patches that restored most of the previous behavior so we shouldn't need the refinement any more. AFAICT the tests should now pass in their previous state but definitely fail in their current state. Do you want to revert this change? Thanks. -- Regards Robi

[PATCH] testsuite: Add -mabi to pr116595.C

2025-04-09 Thread Robin Dapp
Hi, as usual, I forgot to add -mabi=lp64d to the test case. This patch adds it. Going to push as obvious. Regards Robin gcc/testsuite/ChangeLog: * g++.target/riscv/rvv/autovec/pr116595.C: Add -mabi. --- gcc/testsuite/g++.target/riscv/rvv/autovec/pr116595.C | 2 +- 1 file changed, 1 in

Re: vsetvl abormal edge (was Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533])

2025-04-08 Thread Robin Dapp
On 4/8/25 16:32, Vineet Gupta wrote: Yay ! It does work. Awesome. I've uploaded the further reduced test to PR/119533 Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change something else on your local tree still? Yeah I had some debug stuff lying around. In particular

Re: vsetvl abormal edge (was Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533])

2025-04-08 Thread Robin Dapp
Yay ! It does work. Awesome. I've uploaded the further reduced test to PR/119533 Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change something else on your local tree still? On top, I'm now seeing a ton of vsetvl test failures vs just the one I reported... No ide

Re: vsetvl abormal edge (was Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533])

2025-04-08 Thread Robin Dapp
Yay ! It does work. Awesome. I've uploaded the further reduced test to PR/119533 Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change something else on your local tree still? On top, I'm now seeing a ton of vsetvl test failures vs just the one I reported... No ide

Re: vsetvl abormal edge (was Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533])

2025-04-08 Thread Robin Dapp
Yay ! It does work. Awesome. I've uploaded the further reduced test to PR/119533 Hmm, I'm seeing the same ICE as before with my patch. Did you happen to change something else on your local tree still? -- Regards Robin

Re: vsetvl abormal edge (was Re: [PATCH v2] RISC-V: vsetvl: skip abnormal edge on vsetvl insertion [PR119533])

2025-04-08 Thread Robin Dapp
Hi Vineet, However we still see lift up using those blocks - the earliest set computed contained the supposedly elided bbs.   Try lift up 0.   earliest:     Edge(bb 16 -> bb 17): n_bits = 3, set = {1 }   Try lift up 1.   earliest:     Edge(bb 15 -> bb

[PATCH] RISC-V: Do not lift up vsetvl if its VL is used [PR119547].

2025-04-06 Thread Robin Dapp
Hi, before lifting up a vsetvl (that saves VL in a register) to a block we need to ensure that this register is not live in the block. Otherwise we would overwrite the register. There is some conceptual similarity to LCM's transparency property (or ANTLOC) which deals with overwriting an expres

[PATCH] RISC-V: Fix vec_duplicate[bimode] expander [PR119572].

2025-04-02 Thread Robin Dapp
Hi, since r15-9062-g70391e3958db79 we perform vector bitmask initialization via the vec_duplicate expander directly. This triggered a latent bug in ours where we missed to mask out the single bit which resulted in an execution FAIL of pr119114.c The attached patch adds the 1-masking of the broa

Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-04-01 Thread Robin Dapp
Note it's not quite "whatever" -- there is a constraint that vl be monotonically nonincreasing, which in some cases is the only important property. No denying this is an annoyance, though. Yes, I was hoping the smiley would convey that "whatever" was not to be taken literally. In terms of SC

Re: [PATCH] [testsuite] [riscv] xfail some [PR113281] tests

2025-03-31 Thread Robin Dapp
Some of the tests regressed with a fix for the vectorization of shifts. The riscv cost models need to be adjusted to avoid the unprofitable optimization. The failure of these tests has been known since 2024-03-13, without a forthcoming fix, so I suggest we consider it expected by now. Adjust th

Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-31 Thread Robin Dapp
Yeah...and I also don't like the magic "ceil(AVL / 2) ≤ vl ≤ VLMAX if AVL < (2 * VLMAX)" rule... +1, spec has some description about this but I am not sure if I really get the point. From Spec: "For example, this permits an implementation to set vl = ceil(AVL / 2) for VLMAX <

Re: [PATCH v3] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-31 Thread Robin Dapp
LGTM (even though I still don't like the spec :D). We still have an implicit assumption in riscv-vsetvl.cc that might modify LMUL: In prev_ratio_valid_for_next_sew_p and next_ratio_valid_for_prev_sew_p we check whether the ratio of two LMULs is <= 8. ISTR that with recent changes we only re-u

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-03-30 Thread Robin Dapp
So may be the way to go is add a field to the uarch tuning structure indicating the additional cost (if any) of a register file crossing vector op of this nature. Then query that in riscv_rtx_costs or whatever our rtx_cost function is named. Default that additional cost to zero initially. Th

Re: [PATCH] RISC-V: Add pattern for vector-scalar multiply-add/sub [PR119100]

2025-03-27 Thread Robin Dapp
Hi Paul-Antoine, This pattern enables the combine pass to merge a vec_duplicate into a plus-mult or minus-mult RTL instruction. Before this patch, we have two instructions, e.g.: vfmv.v.fv6,fa0 vfmadd.vv v9,v6,v7 After, we get only one: vfmadd.vf v9,fa0,v7 On SPEC201

Re: [PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-27 Thread Robin Dapp
This does not only happen on ELEN=32 and VLEN=32, it happened on all ELEN=32 arch, and one of our internal configurations hit this... Wait, is there something I keep missing? There must be I guess. Disregarding the SEW=8 case because that one is clear, but take for example: ENTRY (RVVMF4HI,

Re: [PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-25 Thread Robin Dapp
zve32x_zvl64b will have the same requirement as zve32x_zvl32b, I mean e16,mf4 could be allowed on zve32x_zvl64b, but it also spec conformance if implementation decides to raise an illegal instruction on e16,mf4, which means e16,mf4 is not safe to use on zve32x/zve32f. OK I see, thanks. Sometime

Re: [PATCH] RISC-V: disable the abd expander for gcc-15 release [PR119224]

2025-03-25 Thread Robin Dapp
- "TARGET_VECTOR" + "TARGET_VECTOR && 0" Would you mind adding a comment here before committing, maybe even reference the PR? Not that we want to keep this around for long anyway but just to make sure :) -- Regards Robin

Re: [PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-25 Thread Robin Dapp
Sorry Kito, that we're having so much back and forth here, it's not my intention to block anything (not that I could anyway). I just want to make sure I properly understand the rationale (or the spec, rather). Oh, ok, I got the point why you confused on this, the new condition is little bit `i

Re: [PATCH v2] RISC-V: Fix wrong LMUL when only implict zve32f.

2025-03-24 Thread Robin Dapp
Hi Kito, So valid range fractional LMUL for SEW=8, 16 32 are: mf8 = [8, (1/8)*32] = [8, 4] = [], no SEW is valid with mf8 for ELEN = 32 mf4 = [8, (1/4)*32] = [8, 8] = only SEW 8 with mf4 is valid mf2 = [8, (1/2)*32] = [8, 16] = SEW 8 and 16 with mf2 are valid [1] https://github.com/riscvarchi

[PATCH] contrib: Wrap git repo access in gcc-changelog.

2025-03-13 Thread Robin Dapp
Hi, since updating to Fedora 41 I have been seeing ignored python exceptions like the following when using 'git gcc-verify' = contrib/gcc_changelog/git_check_commit.py. Checking 90fcc1f4f1a5537e8d30628895a07cbb2e7e16ff: OK Exception ignored in: Traceback (most recent call last): File "/usr/l

Re: [PATCH v1] RISC-V: Refine the testcases for cond_widen_complicate-3

2025-03-13 Thread Robin Dapp
I'm not opposed to refactoring but what's the reason for it? We have a large number of similar tests that also include all possible types. And aren't all the tests you touch FAILing anyway right now? (Due to the combine change...) Yes, the cond_widen_complicate-3 need some tweak for the asm

Re: [PATCH v1] RISC-V: Refine the testcases for cond_widen_complicate-3

2025-03-12 Thread Robin Dapp
From: Pan Li Rearrange the test cases of cond_widen_complicate-3 by different types into different files, instead of put all types together. Then we can easily reduce the range when asm check fails. I'm not opposed to refactoring but what's the reason for it? We have a large number of simil

[PATCH] RISC-V: Mask values before initializing bitmask vector [PR119114].

2025-03-11 Thread Robin Dapp
Hi, in the somewhat convoluted vector code of PR119114 we extracting a mask value from a vector mask. After some middle-end simplifications we end up with a value of -2. Its lowest bit is correctly unset representing "false". When initializing a bitmak vector from values we compare the full va

[PATCH] RISC-V: Do not delete fused vsetvl if it has uses [PR119115].

2025-03-07 Thread Robin Dapp
Hi, in PR119115 we end up with an orphaned vsetvli zero,t1,e16,m1,ta,ma. t1 originally came from another vsetvl that was fused from vsetvli a4,a3,e8,mf2,ta,ma vsetvli t1,a3,e8,mf2,ta,ma (1) to vsetvli zero,a3,e16,m1,ta,ma. This patch checks if t1, the VL operand

[PATCH v2] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-03-05 Thread Robin Dapp
Hi, when merging two vsetvls that both only demand "SEW >= ..." we use their maximum SEW and keep the LMUL. That may lead to invalid vector configurations like e64, mf4. As we make sure that the SEW requirements overlap we can use the SEW and LMUL of the configuration with the larger SEW. Ma J

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-03-05 Thread Robin Dapp
Hi Jin, I apologize for the delayed response. I spent quite a bit of time trying to reproduce the case, and given the passage of time, it wasn't easy to refine the testing. Fortunately, you can see the results here. https://godbolt.org/z/Mc8veW7oT Using GCC version 14.2.0 should allow you to

Re: FRM ABI semantics (was Re: [PATCH v1] RISC-V: Make VXRM as global register [PR118103])

2025-03-04 Thread Robin Dapp
Yeah I didn't know how to articulate  it (and perhaps this still requires clarification) Say we have following // reduced version of  gcc.target/riscv/rvv/base/float-point-frm-run-1.c  main     set_frm (4);    // orig global FRM update     test_float_point_frm_run_1 (op1, op2, vl)    set_fr

Re: [PATCH v1] RISC-V: Fix the test case bug-3.c failure

2025-03-03 Thread Robin Dapp
LGTM. -- Regards Robin

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-02-28 Thread Robin Dapp
What we could do is prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ())); prev.set_vlmul (calculate_vlmul (prev.get_sew (), prev.get_ratio ())); No, that also doesn't work because the ratio can be invalid then. We fuse two vsetvls. One of them has a larger SEW which w

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-02-28 Thread Robin Dapp
Okay, let me explain the background of my previous patch. Prior to applying my patch, for the test case bug-10.c (a reduced example of a larger program with incorrect runtime results), the vsetvli sequence compiled with --param=vsetvl-strategy=simple was as follows: 1. vsetvli zero,a4,e16,m4,ta

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-02-28 Thread Robin Dapp
It seems the issue is we didn't set "vlmul" ? Can we do that: int max_sew = MAX (prev.get_sew (), next.get_sew ()); prev.set_sew (max_sew); prev.set_vlmul (calculate_vlmul (...)); prev.set_ratio (calculate_ratio (prev.get_sew (), prev.get_vlmul ())); What we could do is prev.set_ratio (cal

Re: [PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-02-27 Thread Robin Dapp
This patch modifies the sequence: vsetvli zero,a4,e32,m4,ta,ma + vsetvli zero,a4,e8,m2,ta,ma to: vsetvli zero,a4,e32,m8,ta,ma + vsetvli zero,zero,e8,m2,ta,ma Functionally, there is no difference. However, this change resolves the issue with "e64,mf4", and allows the second vsetvli to omit a4, wh

[PATCH] RISC-V: Adjust LMUL when using maximum SEW [PR117955].

2025-02-27 Thread Robin Dapp
Hi, when merging two vsetvls that both only demand "SEW >= ..." we use their maximum SEW and keep the LMUL. That may lead to invalid vector configurations like e64, mf4. As we make sure that the SEW requirements overlap we can use the SEW and LMUL of the configuration with the larger SEW. Ma J

Re: [PATCH v4] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

2025-02-27 Thread Robin Dapp
Sure thing, will send the v5 for CI system and commit it if no surprise. BTW, shall we plan some refactor for expand_const_vector in next stage 1, which grows to more than 500 lines and unfriendly for debugging up to a point. Yeah, sounds very reasonable. -- Regards Robin

Re: [PATCH v4] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

2025-02-27 Thread Robin Dapp
+/* { dg-do run { target { riscv_v } } } */ +/* { dg-options "-O3 -march=rv64gcv -flto -mrvv-vector-bits=zvl" } */ Ah, the CI flagged the test in previous versions. It's missing the usual -mabi=... I keep forgetting this... -- Regards Robin

Re: [PATCH v4] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

2025-02-27 Thread Robin Dapp
Hi Pan, + poly_int64 base1_poly = rtx_to_poly_int64 (base1); + bool overflow_smode_p = false; + + if (!step1.is_constant ()) + overflow_smode_p = true; + else + { + int elem_count = XVECLEN (src, 0); + uint64_t step1_val

Re: [PATCH v2] RISC-V: Fix bug for expand_const_vector interleave [PR118931]

2025-02-26 Thread Robin Dapp
If you mean the last branch of interleave, I think it is safe because it leverage the merge to generate the result, instead of IOR. Only the IOR for final result have this issue. Yep, I meant checking overflow before the initial if if (known_ge (step1, 0) && known_ge (step2, 0)

  1   2   3   4   5   6   7   8   9   10   >