Re: [r16-3760 Regression] FAIL: g++.target/i386/pr116896-1.C -std=gnu++23 scan-assembler-times \tjp\t 1 on Linux/x86_64

2025-09-11 Thread Robin Dapp
> FAIL: gcc.target/i386/pr116896.c scan-assembler-times \tjp\t 2 > FAIL: g++.target/i386/pr116896-1.C -std=gnu++20 scan-assembler-times \tjp\t > 1 > FAIL: g++.target/i386/pr116896-1.C -std=gnu++23 scan-assembler-times \tjp\t > 1 > FAIL: g++.target/i386/pr116896-1.C -std=gnu++26 scan-assembl

Re: [PATCH V3 2/2] RISC-V: Support vnclip idiom testcase [PR120378]

2025-09-11 Thread Robin Dapp
The test part is still OK of course. -- Regards Robin

Re: [PATCH] vect: Handle grouped accesses via gather/scatter.

2025-09-11 Thread Robin Dapp
Hmm, so the existing "punning" code for VMAT_STRIDED_SLP does tree vtype = vector_vector_composition_type (vectype, const_nunits / n, &ptype); if (vtype != NULL_TREE) {

Re: New optabs and IFN required for early break [bikeshed]

2025-09-11 Thread Robin Dapp
Ah great! Does it just take a mask? could you point me to some docs? You mean just a mask and not a length? No, (almost) all our insns are length-masked. Specs are here: https://github.com/riscvarchive/riscv-v-spec/blob/master/v-spec.adoc#165-vector-compress-instruction -- Regards Robin

Re: New optabs and IFN required for early break [bikeshed]

2025-09-11 Thread Robin Dapp
So AVX512 has vcompressp{d,s} and vexpandp{d,s} (but nothing for smaller integer element types). Those could be used for this but they have a vector result (and element zero would be the first active). But don't you possibly want the last inactive as well, dependent on whether this is a peeled/n

[PATCH] vect: Try signed and unsigned gather offsets.

2025-09-11 Thread Robin Dapp
Hi, This patch adjusts vect_gather_scatter_fn_p to always check an offset type with swapped signedness (vs. the original offset argument). If the target supports the gather/scatter with the new offset type the offset is converted to it before emitting the gather/scatter. In the same way the cost

Re: [PATCH,RFC] RISC-V: Fix typo in tt-ascalon-d8's pipeline description [PR121878]

2025-09-10 Thread Robin Dapp
On today's RISC-V GCC patch review call, someone mentioned there might be an alternative that uses a hook instead. Jeff mentioned this type of check may be needed in other pipeline descriptions either now or in the future, so I thought I'd post what I have so we can discuss which form is preferre

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2025-09-10 Thread Robin Dapp
Jakub, you did the spaceship cmov adjustments last year. Is a change like the following where we introduce a cmov ok? It looks like two of the four functions in the test file were already branchless. I ran a full bootstrap and regtest again, this time the test is unchanged. I think I tested

Re: [PATCH,RFC] RISC-V: Fix typo in tt-ascalon-d8's pipeline description [PR121878]

2025-09-10 Thread Robin Dapp
Hi Robin, > diff --git a/gcc/config/riscv/tt-ascalon-d8.md b/gcc/config/riscv/tt-ascalon-d8.md > index a57c0b31a81..25b99b6129e 100644 > --- a/gcc/config/riscv/tt-ascalon-d8.md > +++ b/gcc/config/riscv/tt-ascalon-d8.md > @@ -285,38 +285,38 @@ (define_insn_reservation "tt_ascalon_d8_vec_ordered

Re: [PATCH] ifcvt: Clarify if_info.original_cost.

2025-09-10 Thread Robin Dapp
Robin, given the time since submission, I would suggest a fresh bootstrap and regression test on x86. Given the V4 passed on x86 and aarch64 in the past and earlier versions tested well on s390, I think just the sanity check on x86 is needed. There are a few regressions on riscv which are all

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-10 Thread Robin Dapp
Hi Kito, we discussed this in yesterday's patchwork sync. Would you mind sharing what the current LLVM implementation does and if this is written down/documented somewhere? In particular the chunk size we split large vectors. Like for a 1024b vector with the "128b ABI", does LLVM use LMUL8

Re: [PATCH] RISC-V: Check if we can vec_extract [PR121510].

2025-09-08 Thread Robin Dapp
I guess for those F16 move or vec_extract patterns it can still be supported even without zvfh/zvfhmin support, but those patterns are guarded by either ZVFH or ZVFHMIN now. However I think what I write above is kind of an optimization, and I think your fix is reasonable for the long term, it can

Re: [PATCH] RISC-V: Adjust tt-ascalon-d8 branch cost

2025-09-08 Thread Robin Dapp
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 07d40f459e3..bfd43fba101 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -659,7 +659,7 @@ static const struct riscv_tune_param tt_ascalon_d8_tune_info = { {COSTS_N_INSNS (3), COSTS_N_INSNS (3)},

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-07 Thread Robin Dapp
I am not sure about passing arguments in m2 but limited operation on m1 is worth spending time on that way because I think it's hard to prevent us from spilling that into the stack and reloading that into the register. And that code gen will kind of make VLS CC become useless due to the code gen

Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vmadd.vv to vmadd.vx on GR2VR cost

2025-09-07 Thread Robin Dapp
Should be git diff issue, I notice this before send it out. Yeah, not your fault, and it's still legible. I was just surprised. The madd and macc can merged into one combine define_expand as well as define_insn, so rename it to mul_then_plus (and add madd part for define_insn) to indicate th

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-06 Thread Robin Dapp
Ok, so whenever we didn't split a vector into LMUL1-sized (128 here) chunks in the first place we cannot go back to LMUL1 any more. Doesn't that also mean that _if_ we split into 128-bit chunks (first case above) running on VLEN=256 would look like v8 = [0, 1, 2, 3, ?, ?, ?, ?] v9 = [4, 5, 6, 7

[PATCH] RISC-V: Check if we can vec_extract [PR121510].

2025-09-05 Thread Robin Dapp
Hi, For Zvfhmin a vector mode exists but the corresponding vec_extract does not. This patch checks that a vec_extract is available and otherwise falls back to standard handling. I cannot test myself right now so handing it off to the CI :) Regards Robin PR target/121510 gcc/ChangeLog

Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vmadd.vv to vmadd.vx on GR2VR cost

2025-09-04 Thread Robin Dapp
From: Pan Li This patch would like to combine the vec_duplicate + vmadd.vv to the vmadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if t

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-04 Thread Robin Dapp
The layout will be different between VLEN=128 and VLEN=256 (and also any larger VLEN) Give a practical example: vec1 allocated into v8, and v9, the reg layout will be: VLEN = 128 v8 = [0, 1, 2, 3] v9 = [4, 5, 6, 7] VLEN=256 v8 = [0, 1, 2, 3, 4, 5, 6, 7] v9 = [?, ?, ?, ?, ?, ?, ?, ?] Then you c

Re: [PATCH v1 1/4] RISC-V: Combine vec_duplicate + vmadd.vv to vmadd.vx on GR2VR cost

2025-09-04 Thread Robin Dapp
So before we had vmacc_vx and now madd can be included. Is this somehow different to pred_mul_plus (without vx)? "mul then plus" sounds like there is some operand order that differs from the regular order but the multiplication is always first IIRC? The difference is just which operand is bei

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-04 Thread Robin Dapp
Hi Robin: Thanks for your try, but before I moving forward to debug that, I want to check with you: I got m2 for the following testcase with following commands: $ riscv64-unknown-linux-gnu-gcc test.c -march=rv64gcv -O2 -S -mrvv-max-lmul=m1 -o - ```c #include typedef int32_t int32x8_t __attribut

[PATCH] RISC-V: Use correct target in expand_vec_perm [PR121780].

2025-09-04 Thread Robin Dapp
Hi, This fixes a glaring mistake in yesterday's change to the expansion of vec_perm. We should of course move tmp_target into the real target and not the other way around. I wonder why my testing hasn't caught this... Anyway, regtested on rv64gcv_zvl512b and going to commit as obvious. Regard

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-09-04 Thread Robin Dapp
Given we use a poly_int64 for bound_epilog elsewhere now the best thing to do would be to have a poly_int64 for bound_prolog as well. For the scaling we'd use estimated_poly_value (align_in_elems) then (I guess for alignment the division by two doesn't make too much sense). For the niter bound w

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-03 Thread Robin Dapp
There are still a few (5) testsuite failures, though. It looks like most of them are similar, latent and due to us not handling small VLS BImodes properly. Maybe we still need diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index d2edffb36a2..d2d99b828ac 100644 --- a/gc

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-03 Thread Robin Dapp
Forgot attachment... -- Regards Robin From a88c6ef5f11aa1f4baeb8640ab355b76134adf6c Mon Sep 17 00:00:00 2001 From: Robin Dapp Date: Wed, 3 Sep 2025 15:51:41 +0200 Subject: [PATCH] RISC-V: Always allow all VLS modes. This patch always enables all VLS modes but does not allow them into hard

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-03 Thread Robin Dapp
I tried to enabled all mode but disable all pattern except move-related pattern (without that it will ICE during expand time) and it will result terrible code gen, give a practical example here: The expand ICE is due to us just checking for mode availability and not vls_mode_valid_p as well. W

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-03 Thread Robin Dapp
Here is the branch in case you are interested in playing with that: https://github.com/kito-cheng/gcc/tree/kitoc/vls-cc-testing Thanks, I'll try my luck and report back. -- Regards Robin

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-09-03 Thread Robin Dapp
Attached is the proper version that has been regtested on x86, regtested on aarch64 and rv64gcv_zvl512b. Regards Robin [PATCH v4] vect: Estimate prolog bound for VLA alignment. Since peeling and version for alignment for VLA modes was introduced (r16-3065-geee51f9a4b6) we have been seeing a lo

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-09-03 Thread Robin Dapp
So even though our strided loads do support signed strides, we cannot go via the recognize gather -> recognize strided offset -> strided load route because the initial signed-offset gather will be unsupported :/ I did want to relax the initial STMT_VINFO_GATHER_SCATTER_P detection to not requ

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-09-02 Thread Robin Dapp
Argh... I realized I sent an old version of the patch, sorry. Will update tomorrow. -- Regards Robin

[PATCH] RISC-V: Fix is_vlmax_len_p and use for strided ops.

2025-09-02 Thread Robin Dapp
Hi, This patch changes is_vlmax_len_p to handle VLS modes properly. Before we would check if len == GET_MODE_NUNITS (mode). This works vor VLA modes but not necessarily for VLS modes. We regularly have e.g. small VLS modes where LEN equals their number of units but which do not span a full vec

Re: [PATCH v5 1/2] RISC-V: Fix can_find_related_mode_p for VLS types

2025-09-02 Thread Robin Dapp
@@ -3047,7 +3047,7 @@ can_find_related_mode_p (machine_mode vector_mode, scalar_mode element_mode, GET_MODE_SIZE (element_mode), nunits)) return true; if (riscv_v_ext_vls_mode_p (vector_mode) - && multiple_p (TARGET_MIN_VLEN * TARGET_MAX_LMUL, + && multip

[PATCH v2] RISC-V: Handle overlap in expand_vec_perm PR121742.

2025-09-02 Thread Robin Dapp
Hi, In a two-source gather we unconditionally overwrite target with the first gather's result already. If op1 == target this clobbers the source operand for the second gather. This patch uses a temporary in that case. Posting a v2 for the CI with the proper PR# and reg_overlap_mentioned_p. Re

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-09-02 Thread Robin Dapp
Yeah, it's not a strict bound, so the function needs to return -1 aka UNKNOWN. But how this -1 should be interpreted differs on context. But for sure -1 cannot be interpreted as actual bound. For frequency scaling I'd use the same logic as for costing - use estimated_poly_value / 2 In the att

Re: [PATCH 2/2] RISC-V: Always register vector built-in functions during LTO [PR110812]

2025-09-02 Thread Robin Dapp
LGTM and I just have a wording nit that was already there before your patch. When reading it I really wondered why we need to "pollute" the flags. Wouldn't a function that performs a "backup" make sense, analogous to ..._restore? And then rename the pollute function to something like enable_

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-02 Thread Robin Dapp
Yeah, I am not insisting it must honor the type, but I am not sure we should move this in that way now, it seems possible to make a vector mode legal but still split by controlling optab_supported_p I think. The issue with the ABI is that the modes are not available at all right now I suppose?

Re: [PATCH] RISC-V: Handle overlap in expand_vec_perm PR121724.

2025-09-02 Thread Robin Dapp
OK. Though I think you've got the wrong PR. 121724 is a C++ rejects-valid bug :-) Oops, yes, thanks. Going to commit with the two changes after the CI is clean. -- Regards Robin

[PATCH] RISC-V: Handle overlap in expand_vec_perm PR121724.

2025-09-01 Thread Robin Dapp
Hi, In a two-source gather we unconditionally overwrite target with the first gather already. If op1 == target this clobbers the source operand for the second gather. This patch uses a temporary in that case. Regtested on rv64gcv_zvl512b. Regards Robin PR target/121724 gcc/ChangeLog

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-01 Thread Robin Dapp
The main reason is that I’m working on the fixed-length-vector calling convention [1]. For that, I need all these VLS types to be available so that arguments can be passed correctly. I know LMUL choice is very u-arch specific, so I agree the option makes sense for the vectorizer. But when people

Re: [PATCH v5 1/2] RISC-V: Fix can_find_related_mode_p for VLS types

2025-09-01 Thread Robin Dapp
I would prefer to keep it use GET_MODE_SIZE here since it make this function more readable, I mean because few lines above just using GET_MODE_SIZE, it might confusing people (at least to me :P) if we use GET_MODE_PRECISION here. Then maybe change 8 to BITS_PER_UNIT? Otherwise one could think t

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

2025-09-01 Thread Robin Dapp
We used to apply -mrvv-max-lmul= to limit VLS code gen, auto vectorizer, and builtin string function expansion. But I think the VLS code gen part doesn't need this limit, since it only happens when the user explicitly writes vector types. For example, int32x8_t under -mrvv-max-lmul=m1 with VLEN=1

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-08-29 Thread Robin Dapp
I think it might be possible that refactoring how we do VMAT_STRIDED_SLP vs VMAT_GATHER/SCATTER, at least and possibly specifically for the case of emulated handling would be a good thing. But it'll require experiments and see how it all fits together. I started experimenting some days ago and

Re: [PATCH] vect: gather/scatter scale fallback.

2025-08-29 Thread Robin Dapp
Thinking about it some more, it might make sense to do the sign swap tries inside vect_gather_scatter_fn_p as well. That wouldn't pollute the callers. I'm still pondering how safe swapping the sign is here. If we have signed indices there won't be any overflow and we should be able to switch t

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-08-29 Thread Robin Dapp
This is v2, changed to estimated_poly_value / 2. Regtested on rv64gcv_zvl512b and aarch64 via qemu. I looked at the test suite results more closely now. While those apply_scale ICEs vanish with the patch there are still a few execution failures with the VLA peeling patch remaining: One is

Re: [PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-08-29 Thread Robin Dapp
This is v2, changed to estimated_poly_value / 2. Regtested on rv64gcv_zvl512b and aarch64 via qemu. since peeling and version for alignment for VLA modes was introduced (r16-3065-geee51f9a4b6) we have been seeing a lot of test suite failures like internal compiler error: in apply_scale, at pr

[PATCH] vect: Set prolog bound to 0 for VLA alignment [PR121523].

2025-08-28 Thread Robin Dapp
Hi, since peeling and version for alignment for VLA modes was introduced (r16-3065-geee51f9a4b6) we have been seeing a lot of test suite failures like internal compiler error: in apply_scale, at profile-count.h:1187 This is because vect_gen_prolog_loop_niters sets the prolog bound to -1 in cas

[PATCH] vect: gather/scatter scale fallback.

2025-08-28 Thread Robin Dapp
Hi, currently for RVV gathers/scatters we accept any scale and extension in the optabs and "just" extend the offset before scaling it properly. This causes two major problems: - These operations are hidden from the vectorizer and thus not costed appropriately. - When the vectorizer chooses a f

Re: [PATCH v1 0/3] RISC-V: Combine vec_duplicate + vnmsac.vv to vnmsac.vx on GR2VR cost

2025-08-28 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vnmsac.vv into vnmsac.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VRlike 1, 2, 15 in test. LGTM. -- Regards Robin

Re: [PATCH] RISC-V: Add pattern for vector-scalar floating-point min

2025-08-28 Thread Robin Dapp
LGTM. Are the IEEE-min/max variants in your plans as well? I haven't checked if we propagate into those at all (unspec), though. -- Regards Robin

Re: [PATCH V2 2/2] RISC-V: Support vnclip idiom testcase [PR120378]

2025-08-28 Thread Robin Dapp
This patch contains testcases for PR120378 after the change made to support the vnclipu variant of the SAT_TRUNC pattern. PR target/120378 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr120378-1.c: New test. * gcc.target/riscv/rvv/autovec/pr120378-2.c: New tes

Re: [PATCH] vect: Extend peeling and versioning for alignment to VLA modes

2025-08-27 Thread Robin Dapp
we're seeing several dozens of ICEs in apply_scale since this patch (PR121523). I didn't pay too much attention due to vacation etc. but now coming back to this. Any specific spot I should start looking? I had a quick look and part? of the issue is that vect_gen_prolog_loop_niters returns -1

Re: [PATCH] vect: Extend peeling and versioning for alignment to VLA modes

2025-08-25 Thread Robin Dapp
Hi Pengfei, we're seeing several dozens of ICEs in apply_scale since this patch (PR121523). I didn't pay too much attention due to vacation etc. but now coming back to this. Any specific spot I should start looking? -- Regards Robin

Re: [PATCH v1 0/4] RISC-V: Combine vec_duplicate + vmacc.vv to vmacc.vx on GR2VR cost

2025-08-21 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vmacc.vv into vmacc.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. LGTM, thanks. -- Regards Robin

Re: [PATCH] RISC-V: testsuite: Fix DejaGnu support for riscv_zvfh

2025-08-21 Thread Robin Dapp
OK. -- Regards Robin

Re: [PATCH] RISC-V: Expand const_vector with 2 elts per pattern.

2025-08-21 Thread Robin Dapp
This is the one git was complaining about on your end? b4 gets it for me via b4 shazam https://inbox.sourceware.org/gcc-patches/dbtiy6uch8xb.3522x0caqi...@gmail.com/ it's a little mangled: the "Hi," is in there, and there's two different line-breaking lengths. So maybe something's odd wi

Re: [PATCH 1/3][v2] Allow fully masked loops with legacy gather/scatter

2025-08-06 Thread Robin Dapp
+ elsvals) + || gs_info->decl != NULL_TREE) Does GATHER_SCATTER_LEGACY_P work here? Or is ifn != IFN_LAST? -- Regards Robin

Re: [PATCH v1] RISC-V: Refactor the vec_duplicate cost on gpr/fpr2vr-cost param

2025-08-06 Thread Robin Dapp
Looks much more reasonable, thanks. I assume the test expectation changes are necessary because the subrtx check captures all operations and with the default settings (scalar-vec-cost = 0) we always prefer vv over vx/vf? -- Regards Robin

Re: [PATCH 2/2] RISC-V: Support vnclip idiom testcase [PR120378]

2025-08-06 Thread Robin Dapp
This patch contains the testcase in PR120378 after the change made to support the vnclipu variant of the SAT_TRUNC pattern. PR target/120378 gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr120378.c: New test. Signed-off-by: Edwin Lu --- .../gcc.target/riscv/rvv/auto

[PATCH] RISC-V: Expand const_vector with 2 elts per pattern.

2025-08-04 Thread Robin Dapp
Hi, In PR121334 we are asked to expand a const_vector of size 4 with poly_int elements. It has 2 elts per pattern so is neither a const_vector_duplicate nor a const_vector_stepped. We don't allow this kind of constant in legitimate_constant_p but expr apparently still wants us to expand it unde

Re: [PATCH v1 1/2] RISC-V: Combine vec_duplicate + vmerge.vv to vmerge.vx on GR2VR cost

2025-08-04 Thread Robin Dapp
@@ -3971,15 +3971,20 @@ get_vector_binary_rtx_cost (rtx x, int scalar2vr_cost) rtx op_0; rtx op_1; - if (GET_CODE (x) == UNSPEC) -{ - op_0 = XVECEXP (x, 0, 0); - op_1 = XVECEXP (x, 0, 1); -} - else + switch (GET_CODE (x)) { - op_0 = XEXP (x, 0); - op_1

Re: [PATCH] tree-optimization/120687 - avoid disturbing reduction chains in reassoc

2025-07-29 Thread Robin Dapp
- if (len >= 3 + if (!reassoc_insert_powi_p + && len >= 3 && (!has_fma /* width > 1 means ranking ops results in better parallelism. Check curre

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-07-28 Thread Robin Dapp
Yeah, VMAT_STRIDED_SLP is what VMAT_ELEMENTWISE was to non-SLP, though how we emit the contiguous part of the SLP group depends and it could be elementwise as fallback. For the single-element case (and only for that one AFAICT) we can switch to VMAT_GATHER_SCATTER. Is the idea to relax that an

Re: [PATCH] RISC-V: Fix some generic-vector-ooo pipeline description issues

2025-07-28 Thread Robin Dapp
-;; Vector crypto, assumed to be a generic operation for now. -(define_insn_reservation "vec_crypto" 4 +;; Vector population count +(define_insn_reservation "vec_pop" 4 (and (eq_attr "tune" "generic_ooo,generic") - (eq_attr "type" "crypto,vclz,vctz,vcpop")) + (eq_attr "type" "vcpop"

Re: [RFC] RISC-V: support vnclip idiom [PR120378]

2025-07-28 Thread Robin Dapp
Hi Edwin, sorry for the slow reply. Currently this patch only supports clipping and returning an unsigned narrow type. I'm unsure if this is the best way to approach the problem as there is a similar optab .SAT_TRUNC which performs a similar operation. The main difference between .NARROW_CLIP a

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-07-25 Thread Robin Dapp
That would definitely be nice to have for both gather and stride loads I'm not sure I like the direction that's heading ;) So the loop I'm targeting is x264's satd: for( int i = 0; i < 4; i++, pix1 += i_pix1, pix2 += i_pix2 ) { a0 = (pix1[0] - pix2[0])... a1 = (pix1[1] -

Re: [PATCH v2 3/3] vect: Use strided loads for VMAT_STRIDED_SLP.

2025-07-25 Thread Robin Dapp
I'm not sure whether handling this case as part of VMAT_STRIDED_SLP is wise. IIRC we do already choose VMAT_GATHER_SCATTER for some strided loads, so why not do strided load/store handling as part of gather/scatter handling? Now that we can deal with gather/scatter misalignment I think we can c

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-25 Thread Robin Dapp
I've folded the "vectorizer status" into the beginning of the BoF, so "only" two slots from my side. Do you still need/want some status from the riscv vector side for the BoF? If so, what would that entail? Rather a look back on what has been done or an outlook of what we're looking at? -- R

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-25 Thread Robin Dapp
OK, so actually generating code with that vector(1) is bad (slower than using scalar code)? Was that the same for PR121048? The general situation is similar but IIRC we had a real vector mode there. There the code didn't look terrible apart from using very small vectors (2 elements). Here I gu

Re: [PATCH] RISC-V: Prepare dynamic LMUL heuristic for SLP.

2025-07-25 Thread Robin Dapp
I was a bit concerned about the stmt_vec_info -> slp_tree hash map at first, but I realized that it’s just a temporary hack, so LGTM :) Thanks, going to commit in a while. Of course you know: "There is nothing more permanent than a temporary solution." :) -- Regards Robin

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-25 Thread Robin Dapp
So what was prevailing_mode then? RVVM2SI, so != word_mode, and basically two glued 32-bit vector regs. We get that from the first call with innermode SI. But /* Fall back to using mode_for_vector, mostly in the hope of being able to use an integer mode. */ if (known_eq

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-25 Thread Robin Dapp
It's probably changes from the RVV cost modeling behavior, the patches are of course not supposed to change code generation. Looks like a proper corner case... From what I can see the difference to before is that we now always call get_related_vectype_for_scalar_type with VOIDmode while we u

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-24 Thread Robin Dapp
It's probably changes from the RVV cost modeling behavior, the patches are of course not supposed to change code generation. These tests don't use dynamic LMUL (which needs a special flag and is not generally active) so it would be odd if they were affected by the costing changes. In particul

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-24 Thread Robin Dapp
I do see regressions for zve32x-3.c et al. Those might be related to the recently fixed tests regarding partial vectorization with vector(1) types but I haven't checked further for now. The regressions are "scan failures". One loop is not loop vectorized any more but SLP vectorized and the f

Re: [PATCH v3 4/5] vect: Misalign checks for gather/scatter.

2025-07-24 Thread Robin Dapp
Stefan kindly ran a regtest on s390 which looked OK as well. I re-tested everything one more time and will commit soon. The patches were bootstrapped individually on x86 (and built on riscv) so I hope it's safe to not squash them. Thanks for the guidance on that patch/series. -- Regards Rob

Re: [PATCH] RISC-V: Rework broadcast handling [PR121073].

2025-07-23 Thread Robin Dapp
Note that your pr121073 test fails :-) So you'll need to adjust something there. OK with pr121073.c fixed... Pushed with -mabi=lp64d. There's nothing I have forgotten more often... -- Regards Robin

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-23 Thread Robin Dapp
't checked further for now. These tests don't use the LMUL heuristic so the failures can't be due to it. I'll see if I can have a look tomorrow. Regards Robin commit ac4c46ee66380fc81b4f4dc0138956e1f2c519c7 Author: Robin Dapp Date: Wed Jul 23 15:31:38 2025 +0200 slp type map

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-23 Thread Robin Dapp
Hmm, we only have one STMT_VINFO_VECTYPE in need_additional_vector_vars_p. I think we can just save the mode/vectype we need during add_stmt_cost and get it later, similar to STMT_VINFO_TYPE. Testing a patch. -- Regards Robin

Re: [PATCH 4/4] Do not set STMT_VINFO_VECTYPE from dataref analysis

2025-07-23 Thread Robin Dapp
Generally, nobody is really happy with it :) It has been limping along for a while and not been used a lot at all. I also see it does compute post-dominators and scrap them for each costing done! For larger functions with many loops that's going to be slow (it's O(function-size)). I think

Re: [PATCH v3 4/5] vect: Misalign checks for gather/scatter.

2025-07-22 Thread Robin Dapp
Note if-conversion emits IFN_MASK_LOAD/STORE, only the vectorizer later emits the LEN variants. So this is about whether there are (might) be uarchs that have vector aligned loads (aka target alignment is sizeof(vector)) and in addition to that have support for misaligned loads but those with sti

Re: [PATCH v3 4/5] vect: Misalign checks for gather/scatter.

2025-07-22 Thread Robin Dapp
So this is the only part I think is odd - there is a dataref, it just has only DR_REF as relevant data. I would have expected we can adjust vect_supportable_dr_alignment to deal with the scatter/gather case. I'm OK with doing it how you did it here, but seeing the /* For now assume all condit

Re: [PATCH] [RFC] Move STMT_VINFO_TYPE to SLP_TREE_TYPE

2025-07-22 Thread Robin Dapp
The more I look at our heuristic the more it appears due for a rewrite. But that's really not in my plans right now. I just sent a riscv patch that does the necessary preparations so you can basically s/STMT_VINFO_TYPE (stmt_info)/SLP_TREE_TYPE (node)/ once it lands. I regtested with your patc

[PATCH] RISC-V: Prepare dynamic LMUL heuristic for SLP.

2025-07-22 Thread Robin Dapp
Hi, This patch prepares the dynamic LMUL vector costing to use the coming SLP_TREE_TYPE instead of the (to-be-removed) STMT_VINFO_TYPE. Even though the whole approach should be reviewed and adjusted at some point, the patch chooses the path of least resistance and uses a hash map for the stmt_in

[PATCH] RISC-V: testsuite: Fix vx_vf_*run-1-f16.c run tests.

2025-07-22 Thread Robin Dapp
Hi, This patch fixes the vf_vfmacc-run-1-f16.c test failures on rv32 by adding zvfh requirements as well as options to the test and the target harness. Regtested on rv64gcv_zvl512b and rv32gcv_zvl512b. Going to commit as obvious if the CI agrees that it's obvious ;) Regards Robin gcc/testsuit

[PATCH] RISC-V: Rework broadcast handling [PR121073].

2025-07-22 Thread Robin Dapp
Hi, During the last weeks it became clear that our current broadcast handling needs an overhaul in order to improve maintainability. PR121073 showed that my intermediate fix wasn't enough and caused regressions. This patch now goes a first step towards untangling broadcast (vmv.v.x), "set first"

Re: [PATCH] [RFC] Move STMT_VINFO_TYPE to SLP_TREE_TYPE

2025-07-21 Thread Robin Dapp
There is currently no way to mimic this, the original idea would have been that you record the per stmt info during add_stmt cost hook time and then process that data at finish_cost time. With SLP you could in theory walk the SLP graph via the instances vector of the vinfo. But I’m not sure w

Re: [PATCH v1 0/5] RISC-V: Combine vec_duplicate + vaaddu.vv to vaaddu.vx on GR2VR cost

2025-07-21 Thread Robin Dapp
This patch would like to introduce the combine of vec_dup + vaaddu.vv into vaaddu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: The series is OK, th

Re: [PATCH] [RFC] Move STMT_VINFO_TYPE to SLP_TREE_TYPE

2025-07-18 Thread Robin Dapp
Can the risc-v people try to sort out this up to a point where I can just s/STMT_VINFO_TYPE/SLP_TREE_TYPE there? I think for us this mainly (only?) concerns the dynamic LMUL heuristic. Currently we go through all vectorized instructions of the loop's blocks, lookup the stmt_vec_info and then get

Re: [PATCH v2] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-15 Thread Robin Dapp
The avg3_floor pattern leverage the add and shift rtl with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode iterator will generate avg3rvvsimode_floor, only the element size QI, HI and SI are allowed. Thus, this patch would like to support the DImode by the standard name, with the iterator V_VLSI_

[PATCH] expand: Allow fixed-point arithmetic for RDIV_EXPR.

2025-07-15 Thread Robin Dapp
Hi, r16-2175-g5aa21765236730 introduced an assert for floating-point modes when expanding an RDIV_EXPR but forgot fixed-point modes. This patch adds ALL_FIXED_POINT_MODE_P to the assert. Bootstrap and regtest running on x86, aarch64, and power10. Regtested on rv64gcv. Regtest on arm running,

[PATCH] RISC-V: Fix vsetvl merge rule.

2025-07-14 Thread Robin Dapp
Hi, In PR120297 we fuse vsetvl e8,mf2,... vsetvl e64,m1,... into vsetvl e64,m4,... Individually, that's ok but we also change the new vsetvl's demand to "SEW only" even though the first original one demanded SEW >= 8 and ratio = 16. As we forget the ratio after the merge we find that the vse

Re: [PATCH v2] RISC-V: Vector-scalar widening multiply-(subtract-)accumulate [PR119100]

2025-07-14 Thread Robin Dapp
This pattern enables the combine pass (or late-combine, depending on the case) to merge a float_extend'ed vec_duplicate into a plus-mult or minus-mult RTL instruction. Before this patch, we have three instructions, e.g.: fcvt.s.h fa5,fa5 vfmv.v.f v24,fa5 vfmadd.vv v8,v24,v1

Re: [PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-14 Thread Robin Dapp
For the record, the Linaro CI notified me that this caused regressions: Produces 2 regressions: | | regressions.sum: | Running gcc:gcc.dg/dg.exp ... | FAIL: gcc.dg/pr103248.c (internal compiler error: in optab_for_tree_code, at optabs-tree.cc:85) | FAIL: gcc.dg/pr103248.c (test for excess e

[PATCH v3 4/5] vect: Misalign checks for gather/scatter.

2025-07-11 Thread Robin Dapp
This patch adds simple misalignment checks for gather/scatter operations. Previously, we assumed that those perform element accesses internally so alignment does not matter. The riscv vector spec however explicitly states that vector operations are allowed to fault on element-misaligned accesses.

[PATCH v3 1/5] ifn: Add helper functions for gather/scatter.

2025-07-11 Thread Robin Dapp
This patch adds access helpers for the gather/scatter offset and scale parameters. gcc/ChangeLog: * internal-fn.cc (expand_scatter_store_optab_fn): Use new function. (expand_gather_load_optab_fn): Ditto. (internal_fn_offset_index): Ditto. (internal_fn_scale

[PATCH v3 5/5] riscv: testsuite: Fix misalignment check.

2025-07-11 Thread Robin Dapp
This fixes a thinko in the misalignment check. If we want to check for vector misalignment support we need to load 16-byte elements, not 8-byte elements that will never be misaligned. gcc/testsuite/ChangeLog: * lib/target-supports.exp: Fix misalignment check. --- gcc/testsuite/lib/targe

[PATCH v3 3/5] vect: Add is_gather_scatter argument to misalignment hook.

2025-07-11 Thread Robin Dapp
This patch adds an is_gather_scatter argument to the support_vector_misalignment hook. All targets but riscv do not care about alignment for gather/scatter so return true for is_gather_scatter. gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_builtin_support_vector_misalignment):

[PATCH v3 2/5] vect: Add helper macros for gather/scatter.

2025-07-11 Thread Robin Dapp
This encapsulates the IFN and the builtin-function way of handling gather/scatter via three defines: GATHER_SCATTER_IFN_P GATHER_SCATTER_LEGACY_P GATHER_SCATTER_EMULATED_P and introduces a helper define for SLP operand handling as well. gcc/ChangeLog: * tree-vect-slp.cc (GATHER_SC

[PATCH v3 0/5] vect: Misalign for gather/scatter.

2025-07-11 Thread Robin Dapp
an alias pointer. I deferred that for now, though. The whole series was regtested and bootstrapped on x86, aarch64, and power10 and I built the patches individually on x86 as well as riscv. It was also regtested on rv64gcv_zvl512b. Robin Dapp (5): ifn: Add helper functions for gather/scatter. vec

[PATCH] expand: ICE if asked to expand RDIV with non-float type.

2025-07-10 Thread Robin Dapp
Hi, this patch adds asserts that ensure we only expand an RDIV_EXPR with actual float mode. It also replaces the RDIV_EXPR in setting a vectorized loop's length by EXACT_DIV_EXPR. The code in question is only used with length-control targets (riscv, powerpc, s390). Bootstrapped and regtested o

[PATCH v2] RISC-V: Make zero-stride load broadcast a tunable.

2025-07-10 Thread Robin Dapp
Hi, Changes from v1: - Use Himode broadcast instead of float broadcast, saving two conversion insns. Let's be daring and leave the thorough testing to the CI first while my own testing is in progress :) This patch makes the zero-stride load broadcast idiom dependent on a uarch-tunable "us

  1   2   3   4   5   6   7   8   9   10   >