[PATCH v1] RISC-V: Add ashiftrt operand 2 for vector avg_floor and avg_ceil

2025-07-19 Thread pan2 . li
From: Pan Li According to the semantics of the avg_floor and avg_ceil as below: floor: op0 = (narrow) (((wide) op1 + (wide) op2) >> 1); ceil: op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1); Aka we have (const_int 1) as the op2 of the ashiftrt but seems missed. Thus, add it back to align t

[PATCH v1] RISC-V: Refine the test case for vector avg_floor and avg_ceil [NFC]

2025-07-18 Thread pan2 . li
From: Pan Li The previous test case doesn't leverage the right test helper macro, it should be DEF_AVG_0_WRAP instead of DEF_AVG_0. We prefer the test function name is test_avg_floor_int64_t_int32_t_0 instead of test_avg_floor_WT_NT_0 for DEF_AVG_0(WT, NT). The below test suites are passed for

[PATCH v1] RISC-V: Support RVVDImode for avg3_ceil auto vect

2025-07-16 Thread pan2 . li
From: Pan Li Like the avg3_floor pattern, the avg3_ceil has the similar issue that lack of the RVV DImode support. Thus, this patch would like to support the DImode by the standard name, with the iterator V_VLSI_D. The below test suites are passed for this patch series. * The rv64gcv fully regr

[PATCH v2] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread pan2 . li
From: Pan Li The avg3_floor pattern leverage the add and shift rtl with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode iterator will generate avg3rvvsimode_floor, only the element size QI, HI and SI are allowed. Thus, this patch would like to support the DImode by the standard name, with the it

[PATCH v1] RISC-V: Support RVVDImode for avg3_floor auto vect

2025-07-14 Thread pan2 . li
From: Pan Li The avg3_floor pattern leverage the add and shift rtl with the DOUBLE_TRUNC mode iterator. Aka, RVVDImode iterator will generate avg3rvvsimode_floor, only the element size QI, HI and SI are allowed. Thus, this patch would like to support the DImode by the standard name, with the it

[PATCH v2 2/2] RISC-V: Add testcase for rv32 SAT_MUL from uint64

2025-07-12 Thread pan2 . li
From: Pan Li Add the run and asm testcase for rv32 SAT_MUL, widen mul from uint8_t, uint16_t, uint32_t to uint64_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test. * gcc.ta

[PATCH v2 1/2] Match: Refine the widen mul check for SAT_MUL pattern

2025-07-12 Thread pan2 . li
From: Pan Li The widen mul will have source type from N-bits to dest type 2N-bits. The previous check only focus on the HOST_WIDE_INT but not working for QI => HI, HI => SI and SI to DImode. Thus, refine the widen mul precision check as dest has twice bits of input. gcc/ChangeLog: * m

[PATCH v2 0/2] Match: Refine the widen mul check for SAT_MUL pattern

2025-07-12 Thread pan2 . li
From: Pan Li The widen mul will have source type from N-bits to dest type 2N-bits. The previous check only focus on the HOST_WIDE_INT but not working for QI => HI, HI => SI and SI => DI. Thus, refine the widen mul precision check, aka dest has twice bits of input. The below test suites are pas

[PATCH v1 2/2] RISC-V: Add testcase for rv32 SAT_MUL from uint64

2025-07-10 Thread pan2 . li
From: Pan Li Add the run and asm testcase for rv32 SAT_MUL, widen mul from uint8_t, uint16_t, uint32_t to uint64_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_u_mul-1-u16-from-u64.c: New test. * gcc.target/riscv/sat/sat_u_mul-1-u32-from-u64.c: New test. * gcc.ta

[PATCH v1 1/2] Match: Leverage BITS_PER_WORD for unsigned SAT_MUL pattern

2025-07-10 Thread pan2 . li
From: Pan Li The widen mul has different source type for differnt platform, like rv32 or rv64. For rv32, the source of widen mul is 32-bits while 64-bits in rv64. Thus, leverage HOST_WIDE_INT is not that correct and result in the pattern match failures in 32-bits system like rv32. Thus, levera

[PATCH v1 0/2] Refine the unsigned SAT_MUL for 32-bits like rv32

2025-07-10 Thread pan2 . li
From: Pan Li The widen mul has different source type for differnt machines, like rv32 or rv64. The SAT_MUL pattern doesn't works well for backend like rv32 in previous, thus we would like to refine it by BITS_PER_WORD for precision check. The below test suites are passed for this patch: 1. The

[PATCH v1] RISCV: Remove the v extension requirement for sat scalar run test

2025-07-08 Thread pan2 . li
From: Pan Li The sat scalar run test should not require the v extension, thus take rv32 || rv64 instead of riscv_v for the requirement. The below test suites are passed for this patch series. * The rv64gcv fully regression test. * The rv32gcv fully regression test. gcc/testsuite/ChangeLog:

[PATCH v1] RISC-V: Disable uint128_t testcase of SAT_MUL when rv32

2025-07-07 Thread pan2 . li
From: Pan Li The rv32 doesn't support __uint128, and then we will have error like below during test. error: '__int128' is not supported on this target. Thus, we disable the uint128_t related test when rv32. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add xlen check fo

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vssub.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-07-07 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssub.vv combine to vssub.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vssub.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-07-07 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssub.vv combine to vssub.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gc

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-07 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssub.vv to the vssub.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vssub.vv to vssub.vx on GR2VR cost

2025-07-07 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssub.vv into vssub.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v3 3/3] RISC-V: Add test for vec_duplicate + vsadd.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-07-03 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vsadd.vv combine to vsadd.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto. * gc

[PATCH v3 2/3] RISC-V: Add test for vec_duplicate + vsadd.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-07-03 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vsadd.vv combine to vsadd.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v3 1/3] RISC-V: Combine vec_duplicate + vsadd.vv to vsadd.vx on GR2VR cost

2025-07-03 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vsadd.vv to the vsadd.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v3 0/3] RISC-V: Combine vec_duplicate + vsadd.vv to vsadd.vx on GR2VR cost

2025-07-03 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vsadd.vv into vsadd.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v3 4/4] RISC-V: Add test cases for unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li Add run and tree-optimized check for unsigned scalar SAT_MUL from uint128_t. gcc/testsuite/ChangeLog: * gcc.target/riscv/sat/sat_arith.h: Add test helper macros. * gcc.target/riscv/sat/sat_arith_data.h: Add test data for run test. * gcc.target/riscv/

[PATCH v3 3/4] RISC-V: Implement unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to implement the SAT_MUL scalar unsigned from uint128_t, aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max; else

[PATCH v3 0/4] Support unsigned scalar SAT_MUL from uint128_t

2025-07-01 Thread pan2 . li
From: Pan Li This patch series would like to support the unsigned SAT_MUL with the help of uint128_t. Aka: NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max; else return

[PATCH v3 2/4] Widening-Mul: Support unsigned scalar SAT_MUL form 1

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to try to match the SAT_MUL during widening-mul pass, aka below pattern. NT __attribute__((noinline)) sat_u_mul_##NT##_fmt_1 (NT a, NT b) { uint128_t x = (uint128_t)a * (uint128_t)b; NT max = -1; if (x > (uint128_t)(max)) return max;

[PATCH v3 1/4] Internal-fn: Introduce new IFN_SAT_MUL for unsigned int

2025-07-01 Thread pan2 . li
From: Pan Li This patch would like to add the middle-end presentation for the unsigend saturation mul. Aka set the result of mul to the max when overflow. Take uint8_t as example, we will have: * SAT_MUL (1, 127) => 127. * SAT_MUL (2, 127) => 254. * SAT_MUL (3, 127) => 255. * SAT_MUL (25

[PATCH v3 4/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v3 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v3 2/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v3 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v3 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v2 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v2 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v2 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-27 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v2 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-27 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v2 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-27 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-26 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vssubu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-26 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vssubu.vv combine to vssubu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vssubu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v1 4/4] RISC-V: Reconcile the existing test due to cost model change

2025-06-26 Thread pan2 . li
From: Pan Li The cost model change will make the default cost of vx to 2, thus reconcile the asm check for this change. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/sat/vec_sat_u_sub_trunc-1-u16.c: Update the asm check due to cost model change. * gcc.target/ri

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vssubu.vv to the vssubu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vssubu.vv to vssubu.vx on GR2VR cost

2025-06-26 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vssubu.vv into vssubu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-21 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vsaddu.vv combine to vsaddu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vsaddu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vsaddu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-21 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vsaddu.vv combine to vsaddu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-20 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vsaddu.vv to the vsaddu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vsaddu.vv to vsaddu.vx on GR2VR cost

2025-06-20 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vsaddu.vv into vsaddu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v2] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-20 Thread pan2 . li
From: Pan Li The will be one ICE when expand pass, the bt similar as below. during RTL pass: expand red.c: In function 'main': red.c:20:5: internal compiler error: in require, at machmode.h:323 20 | int main() { | ^~~~ 0x2e0b1d6 internal_error(char const*, ...) ../../../gcc/

[PATCH v1] RISC-V: Fix ICE for expand_select_vldi [PR120652]

2025-06-19 Thread pan2 . li
From: Pan Li The will be one ICE when expand pass, the bt similar as below. during RTL pass: expand red.c: In function 'main': red.c:20:5: internal compiler error: in require, at machmode.h:323 20 | int main() { | ^~~~ 0x2e0b1d6 internal_error(char const*, ...) ../../../gcc/

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vminu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-19 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vminu.vv combine to vminu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vminu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR cost

2025-06-19 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vminu.vv into vminu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vminu.vv to vminu.vx on GR2VR cost

2025-06-19 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vminu.vv to the vminu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vminu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-19 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vminu.vv combine to vminu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vmin.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-16 Thread pan2 . li
From: Pan Li Add asm dump check and run test for vec_duplicate + vmin.vv combine to vmin.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmin.vv to vmin.vx on GR2VR cost

2025-06-16 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmin.vv into vmin.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0: |

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vmin.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-16 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmin.vv combine to vmin.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check for vmin.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vmin.vv to vmin.vx on GR2VR cost

2025-06-16 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmin.vv to the vmin.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the

[PATCH v1] RISC-V: Refine VX combine test case 0 to avoid code duplication

2025-06-15 Thread pan2 . li
From: Pan Li The case 0 for vx combine def functions are most the same across the different test files. Thus, re-arrange them in one place to avoid code duplication. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Leverage helper macros to avoid code d

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-14 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vmaxu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost

2025-06-14 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmaxu.vv into vmaxu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 2, 15 in test. There will be two cases for the combine: Case 0:

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vmaxu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-14 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmaxu.vv combine to vmaxu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vmaxu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/v

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vmaxu.vv to vmaxu.vx on GR2VR cost

2025-06-14 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmaxu.vv to the vmaxu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 3/5] RISC-V: Add test for vec_dup + vmax.vv combine case 0 with max func 1 and GR2VR cost 0, 2 and 15

2025-06-12 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for max func 1 vmax.vx combine. * gcc.target/riscv/rvv/autovec

[PATCH v1 1/5] RISC-V: Combine vec_duplicate + vmax.vv to vmax.vx on GR2VR cost

2025-06-12 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmax.vv to the vmax.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the

[PATCH v1 2/5] RISC-V: Add test for vec_dup + vmax.vv combine case 0 with max func 0 and GR2VR cost 0, 2 and 15

2025-06-11 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for max func 1 vmax.vx combine. * gcc.target/riscv/rvv/autovec

[PATCH v1 0/5] RISC-V: Combine vec_duplicate + vmax.vv to vmax.vx on GR2VR cost

2025-06-11 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmax.vv into vmax.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: | .

[PATCH v1 5/5] RISC-V: Add test for vec_dup + vmax.vv combine case 1 with max func 1 and GR2VR cost 0, 1 and 2

2025-06-11 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check for vmax.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-

[PATCH v1 4/5] RISC-V: Add test for vec_dup + vmax.vv combine case 1 with max func 0 and GR2VR cost 0, 1 and 2

2025-06-11 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmax.vv combine to vmax.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check for vmax.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vremu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-09 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vremu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-

[PATCH v1 2/4] RISC-V: Reconcile the existing test for vremu.vx combine

2025-06-09 Thread pan2 . li
From: Pan Li Some existing vrem related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Adjust the asm check for vremu. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Ditto. S

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vremu.vv to vremu.vx on GR2VR cost

2025-06-09 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vremu.vv to the vremu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1 4/4] RISC-V: Add test for vec_duplicate + vremu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-09 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vremu.vv combine to vremu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vremu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vremu.vv to vremu.vx on GR2VR cost

2025-06-09 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vremu.vv into vremu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: |

[PATCH v1 4/4] RISC-V: Add test for vec_duplicate + vrem.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-08 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check for vrem.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vrem.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-08 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vrem.vv combine to vrem.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check for vrem.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx-1

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vrem.vv to vrem.vx on GR2VR cost

2025-06-08 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vrem.vv into vrem.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: | .

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vrem.vv to vrem.vx on GR2VR cost

2025-06-08 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vrem.vv to the vrem.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the

[PATCH v1 2/4] RISC-V: Reconcile the existing test for vrem.vx combine

2025-06-08 Thread pan2 . li
From: Pan Li Some existing vrem related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vrem-rv32gcv.c: Adjust the asm check for vrem. * gcc.target/riscv/rvv/autovec/binop/vrem-rv64gcv.c: Ditto. Si

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vdivu.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-06 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vdivu.vv combine to vdivu.vx, with the GR2VR cost is 0, 1 and 2. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Add asm check for vdivu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/vx

[PATCH v1 4/4] RISC-V: Reconcile the existing test for vdivu.vx combine

2025-06-06 Thread pan2 . li
From: Pan Li Some existing vdiv related test need some adjust for the asm check due to cost model. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Adjust the asm check for vdivu. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Ditt

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vdivu.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-06 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vdivu.vv combine to vdivu.vx, with the GR2VR cost is 0, 2 and 15. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check for vdivu.vx combine. * gcc.target/riscv/rvv/autovec/vx_vf/v

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vdivu.vv to vdivu.vx on GR2VR cost

2025-06-06 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vdivu.vv into vdivu.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: |

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vidvu.vv to vdivu.vx on GR2VR cost

2025-06-06 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vdivu.vv to the vdivu.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if th

[PATCH v1] RISC-V: Fix ICE for gcc.dg/graphite/pr33576.c with rv32gcv

2025-06-04 Thread pan2 . li
From: Pan Li The div of rvv has not such insn v2 = div (vec_dup (x), v1), thus the generated rtl like that hit the unreachable assert when expand insn. This patch would like to remove op div from the binary op form (vec_dup (x), v) to avoid pattern matching by mistake. No new test introduced as

[PATCH v1] RISC-V: Leverage get_vector_binary_rtx_cost to avoid code dup [NFC]

2025-06-03 Thread pan2 . li
From: Pan Li Some similar code could be wrapped to func get_vector_binary_rtx_cost, thus leverage this function to avoid code duplication. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/ChangeLog: * config/riscv/riscv.cc (get_vector_bin

[PATCH v1 1/4] RISC-V: Combine vec_duplicate + vidv.vv to vdiv.vx on GR2VR cost

2025-06-02 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vdiv.vv to the vdiv.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the

[PATCH v1 2/4] RISC-V: Add test for vec_duplicate + vdiv.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-06-02 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vdiv.vv combine to vdiv.vx, with the GR2VR cost is 0, 2 and 15. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add a

[PATCH v1 3/4] RISC-V: Add test for vec_duplicate + vdiv.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-06-02 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vdiv.vv combine to vdiv.vx, with the GR2VR cost is 0, 1 and 2. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add as

[PATCH v1 4/4] RISC-V: RISC-V: Reconcile the existing test for vdiv.vx combine

2025-06-02 Thread pan2 . li
From: Pan Li Some existing vdiv related test need some adjust for the asm check. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Adjust the asm check for vdiv. * gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv.c: Ditto. * gcc.ta

[PATCH v1 0/4] RISC-V: Combine vec_duplicate + vdiv.vv to vdiv.vx on GR2VR cost

2025-06-02 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vdiv.vv into vdiv.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: | .

[PATCH v1] RISC-V: Fix line too long format issue for autovect.md [NFC]

2025-05-30 Thread pan2 . li
From: Pan Li Inspired by the avg_ceil patches, notice there were even more lines too long from autovec.md. So fix that format issues. gcc/ChangeLog: * config/riscv/autovec.md: Fix line too long for sorts of pattern. Signed-off-by: Pan Li --- gcc/config/riscv/autovec.md | 54

[PATCH v1 3/3] RISC-V: Add test cases for avg_ceil vaadd implementation

2025-05-29 Thread pan2 . li
From: Pan Li Add asm and run testcase for avg_ceil vaadd implementation. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/avg.h: Add test helper macros. * gcc.target/riscv/rvv/au

[PATCH v1 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_ceil

2025-05-29 Thread pan2 . li
From: Pan Li The avg_ceil has the rounding mode towards +inf, while the vaadd.vv has the rnu which totally match the sematics. From RVV spec, the fixed vaadd.vv with rnu, roundoff_signed(v, d) = (signed(v) >> d) + r r = v[d - 1] For vaadd, d = 1, then we have roundoff_signed(v, 1) = (signed(v

[PATCH v1 2/3] RISC-V: Reconcile the existing test for avg_ceil

2025-05-29 Thread pan2 . li
From: Pan Li Some existing avg_floor test need updated due to change to leverage vaadd.vv directly. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/avg-4.c: Update asm check to vaadd. * gcc.target/riscv/rvv/autovec/vls/avg-5.c: Ditto. * gcc.target/ris

[PATCH v1 0/3] Refine the avg_ceil with fixed point vaadd

2025-05-29 Thread pan2 . li
From: Pan Li Similar to the avg_floor, the avg_ceil has the rounding mode towards +inf, while the vaadd.vv has the rnu which totally match the sematics. From RVV spec, the fixed vaadd.vv with rnu, roundoff_signed(v, d) = (signed(v) >> d) + r r = v[d - 1] For vaadd, d = 1, then we have roundof

[PATCH v1 3/3] RISC-V: Add test for vec_duplicate + vmul.vv combine case 1 with GR2VR cost 0, 1 and 2

2025-05-28 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmul.vv combine to vmul.vx, with the GR2VR cost is 0, 1 and 2. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add as

[PATCH v1 2/3] RISC-V: Add test for vec_duplicate + vmul.vv combine case 0 with GR2VR cost 0, 2 and 15

2025-05-28 Thread pan2 . li
From: Pan Li Add asm dump check test for vec_duplicate + vmul.vv combine to vmul.vx, with the GR2VR cost is 0, 2 and 15. The below test suites are passed for this patch. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add a

[PATCH v1 1/3] RISC-V: Combine vec_duplicate + vmul.vv to vmul.vx on GR2VR cost

2025-05-28 Thread pan2 . li
From: Pan Li This patch would like to combine the vec_duplicate + vmul.vv to the vmul.vx. From example as below code. The related pattern will depend on the cost of vec_duplicate from GR2VR. Then the late-combine will take action if the cost of GR2VR is zero, and reject the combination if the

[PATCH v1 0/3] RISC-V: Combine vec_duplicate + vmul.vv to vmul.vx on GR2VR cost

2025-05-28 Thread pan2 . li
From: Pan Li This patch would like to introduce the combine of vec_dup + vmul.vv into vmul.vx on the cost value of GR2VR. The late-combine will take place if the cost of GR2VR is zero, or reject the combine if non-zero like 1, 15 in test. There will be two cases for the combine: Case 0: | .

[PATCH v2 3/3] RISC-V: Add test cases for avg_floor vaadd implementation

2025-05-27 Thread pan2 . li
From: Pan Li Add asm and run testcase for avg_floor vaadd implementation. The below test suites are passed for this patch series. * The rv64gcv fully regression test. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/avg.h: New test. * gcc.target/riscv/rvv/autovec/avg_dat

[PATCH v2 2/3] RISC-V: Reconcile the existing test for avg_floor

2025-05-27 Thread pan2 . li
From: Pan Li Some existing avg_floor test need updated due to change to leverage vaadd.vv directly. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/avg-1.c: Update asm check to vaadd. * gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto. * gcc.target/ris

[PATCH v2 1/3] RISC-V: Leverage vaadd.vv for signed standard name avg_floor

2025-05-27 Thread pan2 . li
From: Pan Li The signed avg_floor totally match the sematics of fixed point rvv insn vaadd, within round down. Thus, leverage it directly to implement the avf_floor. The spec of RVV is somehow not that clear about the difference between the float point and fixed point for the rounding that disc

  1   2   3   4   5   6   7   8   >