Re: [PATCH] target/riscv: rvv: Minimum VLEN needs to respect V/Zve extensions

2025-07-07 Thread Max Chou
On 6/27/25 10:21 AM, Max Chou wrote: > > According to the RISC-V instruction set manual, the minimum VLEN needs > > to respect the following extensions: > > > >Extension Minimum VLEN > > * V 128 > > * Zve64[d|f|x] 64 > > * Zve32[f|x

Re: [PATCH v2 2/3] target/riscv: rvv: Apply vext_check_input_eew to vector reduction instructions

2025-07-07 Thread Max Chou
Hi Nutty, Thanks for the suggestion. I'll provide a new version including the new description and a fix about another EEWs issue. Thanks, Max On Tue, Jul 1, 2025 at 2:43 PM Nutty Liu wrote: > On 6/27/2025 9:20 PM, Max Chou wrote: > > From: Anton Blanchard > > >

[PATCH v2 0/3] Fix some more RVV source overlap issues

2025-06-27 Thread Max Chou
This patchset is based on the v1 provided by Anoton Blanchard with following update: * Add the missing input EEWs checking rule for widen vector reduction instruction. Reference: * v1: 20250415043207.3512209-1-ant...@tenstorrent.com Anton Blanchard (3): target/riscv: rvv: Apply vext_check_in

[PATCH] target/riscv: rvv: Fix missing exit TB flow for ldff_trans

2025-06-27 Thread Max Chou
According to the V spec, the vector fault-only-first load instructions may change the VL CSR. So the ldff_trans TCG translation function should generate the lookup_and_goto_ptr flow as the vsetvl/vsetvli translation function to make sure the vl_eq_vlmax TB flag is correct. Signed-off-by: Max Chou

[PATCH] target/riscv: rvv: Minimum VLEN needs to respect V/Zve extensions

2025-06-27 Thread Max Chou
According to the RISC-V instruction set manual, the minimum VLEN needs to respect the following extensions: Extension Minimum VLEN * V 128 * Zve64[d|f|x] 64 * Zve32[f|x] 32 Signed-off-by: Max Chou --- target/riscv/tcg/tcg-cpu.c | 13 +++-- 1 file changed, 11

[PATCH v2 3/3] target/riscv: vadc and vsbc are vm=0 instructions

2025-06-27 Thread Max Chou
From: Anton Blanchard We were marking vadc and vsbc as vm=1 instructions, which meant vext_check_input_eew wouldn't detect mask vs source register overlaps. Signed-off-by: Anton Blanchard Reviewed-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn32.decode | 10 +- 1

[PATCH v2 2/3] target/riscv: rvv: Apply vext_check_input_eew to vector reduction instructions

2025-06-27 Thread Max Chou
From: Anton Blanchard Handle the overlap of source registers with different EEWs. Signed-off-by: Anton Blanchard Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc

[PATCH v2 1/3] target/riscv: rvv: Apply vext_check_input_eew to vector integer/fp compare instructions

2025-06-27 Thread Max Chou
From: Anton Blanchard Handle the overlap of source registers with different EEWs. Signed-off-by: Anton Blanchard Reviewed-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff

[PATCH v3 00/10] Fix RVV encoding corner cases

2025-04-08 Thread Max Chou
estions and review. Anton Blanchard (2): target/riscv: rvv: Source vector registers cannot overlap mask register target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS Max Chou (8): target/riscv: rvv: Apply vext_check_input_eew to vrgather instructions to check mismatched i

[PATCH v3 01/10] target/riscv: rvv: Source vector registers cannot overlap mask register

2025-04-08 Thread Max Chou
From: Anton Blanchard Add the relevant ISA paragraphs explaining why source (and destination) registers cannot overlap the mask register. Signed-off-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Reviewed-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans

[PATCH v3 06/10] target/riscv: rvv: Apply vext_check_input_eew to vector slide instructions(OPIVI/OPIVX)

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v3 05/10] target/riscv: rvv: Apply vext_check_input_eew to OPIVV/OPFVV(vext_check_sss) instructions

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 1 + 1 file changed, 1 insertion(+) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b

[PATCH v3 08/10] target/riscv: rvv: Apply vext_check_input_eew to vector narrow/widen instructions

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. The vd of vector widening mul-add instructions is one of the input operands. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvbf16.c.inc | 9 ++- target

[PATCH v3 07/10] target/riscv: rvv: Apply vext_check_input_eew to vector integer extension instructions(OPMVV)

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v3 10/10] target/riscv: Fix the rvv reserved encoding of unmasked instructions

2025-04-08 Thread Max Chou
According to the v spec, the encodings of vcomoress.vm and vector mask-register logical instructions with vm=0 are reserved. Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn32.decode | 18 +- 1 file changed, 9 insertions(+), 9 deletions

[PATCH v3 02/10] target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS

2025-04-08 Thread Max Chou
From: Anton Blanchard Signed-off-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Reviewed-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/riscv/insn_trans

[PATCH v3 04/10] target/riscv: rvv: Apply vext_check_input_eew to OPIVI/OPIVX/OPFVF(vext_check_ss) instructions

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v3 09/10] target/riscv: rvv: Apply vext_check_input_eew to vector indexed load/store instructions

2025-04-08 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/target/riscv

[PATCH v3 03/10] target/riscv: rvv: Apply vext_check_input_eew to vrgather instructions to check mismatched input EEWs encoding constraint

2025-04-08 Thread Max Chou
According to the v spec, a vector register cannot be used to provide source operands with more than one EEW for a single instruction. The vs1 EEW of vrgatherei16.vv is 16. Co-authored-by: Anton Blanchard Reviewed-by: Daniel Henrique Barboza Signed-off-by: Max Chou --- target/riscv/insn_trans

Re: [PATCH v2 05/12] target/riscv: rvv: Apply vext_check_input_eew to OPIVI/OPIVX/OPFVF(vext_check_ss) instructions

2025-04-07 Thread Max Chou
On 2025/4/5 5:17 PM, Daniel Henrique Barboza wrote: On 3/29/25 11:44 AM, Max Chou wrote: Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou ---   target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-   1

Re: [PATCH v2 04/12] target/riscv: rvv: Apply vext_check_input_eew to vector register gather instructions

2025-04-07 Thread Max Chou
On 2025/4/5 5:14 PM, Daniel Henrique Barboza wrote: On 3/29/25 11:44 AM, Max Chou wrote: Handle the overlap of source registers with different EEWs. The vs1 EEW of vrgatherei16.vv is 16. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Since you're marked as Author you don&#

Re: [PATCH v2 03/12] target/riscv: Add vext_check_input_eew to check mismatched input EEWs encoding constraint

2025-04-07 Thread Max Chou
On 2025/4/5 5:09 PM, Daniel Henrique Barboza wrote: On 3/29/25 11:44 AM, Max Chou wrote: According to the v spec, a vector register cannot be used to provide source operands with more than one EEW for a single instruction. Signed-off-by: Max Chou ---   target/riscv/insn_trans

[PATCH v2 03/12] target/riscv: Add vext_check_input_eew to check mismatched input EEWs encoding constraint

2025-04-05 Thread Max Chou
According to the v spec, a vector register cannot be used to provide source operands with more than one EEW for a single instruction. Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 29 + 1 file changed, 29 insertions(+) diff --git a/target/riscv

[PATCH v2 05/12] target/riscv: rvv: Apply vext_check_input_eew to OPIVI/OPIVX/OPFVF(vext_check_ss) instructions

2025-03-30 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v2 01/12] target/riscv: rvv: Source vector registers cannot overlap mask register

2025-03-29 Thread Max Chou
From: Anton Blanchard Add the relevant ISA paragraphs explaining why source (and destination) registers cannot overlap the mask register. Signed-off-by: Anton Blanchard Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 29 ++--- 1 file changed, 26

[PATCH v2 10/12] target/riscv: rvv: Apply vext_check_input_eew to vector narrow instructions

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v2 09/12] target/riscv: rvv: Apply vext_check_input_eew to vector widen instructions(OPMVV/OPMVX/etc.)

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. The vd of vector widening mul-add instructions is one of the input operands. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvbf16.c.inc | 9 ++- target/riscv

[PATCH v2 04/12] target/riscv: rvv: Apply vext_check_input_eew to vector register gather instructions

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. The vs1 EEW of vrgatherei16.vv is 16. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 3 +++ 1 file changed, 3 insertions(+) diff --git a/target/riscv

[PATCH v2 06/12] target/riscv: rvv: Apply vext_check_input_eew to OPIVV/OPFVV(vext_check_sss) instructions

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 1 + 1 file changed, 1 insertion(+) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv

[PATCH v2 12/12] target/riscv: Fix the rvv reserved encoding of unmasked instructions

2025-03-29 Thread Max Chou
According to the v spec, the encodings of vcomoress.vm and vector mask-register logical instructions with vm=0 are reserved. Signed-off-by: Max Chou --- target/riscv/insn32.decode | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/riscv/insn32.decode b

[PATCH v2 08/12] target/riscv: rvv: Apply vext_check_input_eew to vector integer extension instructions(OPMVV)

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

[PATCH v2 11/12] target/riscv: rvv: Apply vext_check_input_eew to vector indexed load/store instructions

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans

[PATCH v2 00/12] Fix RVV encoding corner cases

2025-03-29 Thread Max Chou
andling of register overlaps in vector widening/narrowing instructions 4. Fix unmasked RVV instruction encoding (e.g. vcompress.vm) Anton Blanchard (2): target/riscv: rvv: Source vector registers cannot overlap mask register target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS Ma

[PATCH v2 02/12] target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS

2025-03-29 Thread Max Chou
From: Anton Blanchard Signed-off-by: Anton Blanchard Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc

[PATCH v2 07/12] target/riscv: rvv: Apply vext_check_input_eew to vector slide instructions(OPIVI/OPIVX)

2025-03-29 Thread Max Chou
Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard Co-authored-by: Max Chou Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/target/riscv/insn_trans

Re: [PATCH 00/12] target/riscv: Fix some RISC-V instruction corner cases

2025-02-27 Thread Max Chou
Hi Anton, I hope you’re doing well. While reviewing this patchset, I noticed a few missing parts related to the mismatched input EEWs encoding constraint. I also found a few other rvv encoding issues and planned to submit an upstream patchset to address them. However, I think it would be bette

Re: [PATCH 11/12] target/riscv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS

2025-02-07 Thread Max Chou
Reviewed-by: Max Chou On 2025/1/26 3:20 PM, Anton Blanchard wrote: Signed-off-by: Anton Blanchard --- target/riscv/insn_trans/trans_rvv.c.inc | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv

Re: [PATCH 10/12] target/riscv: handle vwadd.wv form vs1 and vs2 overlap

2025-02-07 Thread Max Chou
Hi Anton, This patch violates some coding style rules of QEMU. You can verify the coding style by running the checkpatch.pl script in the QEMU repository. (ref: https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style) The patch 12 also has the same issue. Than

Re: [PATCH 05/12] target/riscv: handle vslide1down.vx form mask and source overlap

2025-02-07 Thread Max Chou
Hi Anton, The vext_check_slide function affects the vslide[up|down].v[x|i]/vfslide1[up|down].vf/vslide1[up|down].vx instructions than the vslide1down.vx instruction alone. Therefore, it would be more appropriate to update the commit message to provide a clearer information. (PS:perhaps, using

Re: [PATCH 03/12] target/riscv: handle vadd.vx form mask and source overlap

2025-02-07 Thread Max Chou
Hi Anton, I think that the commit message could be improved for better clarity. The vext_check_ss function affects more RVV instructions than the vadd.vx instruction alone. (PS:perhaps using the category (OPIVX/OPFVF/etc.) to describe the affected RVV instructions would be more helpful.) Addit

Re: [PATCH 02/12] target/riscv: handle vrgather mask and source overlap

2025-02-07 Thread Max Chou
Hi Anton, You might need to extend this patch or provide a new patch to handle the different EEWs source operands checking for the vrgatherei16.vv instruction (when SEW is not 16). Thanks, Max On 2025/1/26 3:20 PM, Anton Blanchard wrote: Signed-off-by: Anton Blanchard --- target/riscv/insn

Re: [PATCH 01/12] target/riscv: Source vector registers cannot overlap mask register

2025-02-07 Thread Max Chou
Reviewed-by: Max Chou On 2025/1/26 3:20 PM, Anton Blanchard wrote: Add the relevant ISA paragraphs explaining why source (and destination) registers cannot overlap the mask register. Signed-off-by: Anton Blanchard --- target/riscv/insn_trans/trans_rvv.c.inc | 29

[PATCH] target/riscv: rvv: Fix unexpected behavior of vector reduction instructions when vl is 0

2025-01-24 Thread Max Chou
According to the Vector Reduction Operations section in the RISC-V "V" Vector Extension spec, "If vl=0, no operation is performed and the destination register is not updated." The vd should be updated when vl is larger than 0. Signed-off-by: Max Chou --- target/riscv

[PATCH] target/riscv: rvv: Fix incorrect vlen comparison in prop_vlen_set

2025-01-24 Thread Max Chou
In prop_vlen_set function, there is an incorrect comparison between vlen(bit) and vlenb(byte). This will cause unexpected error when user applies the `vlen=1024` cpu option with a vendor predefined cpu type that the default vlen is 1024(vlenb=128). Signed-off-by: Max Chou --- target/riscv/cpu.c

[PATCH] target/riscv: rvv: Fix vslide1[up|down].vx unexpected result when XLEN=32 and SEW=64

2025-01-23 Thread Max Chou
value to 64 bits during the TCG translation phase to ensure that the helper functions won't lost the higer 32 bits. Signed-off-by: Max Chou --- target/riscv/helper.h | 16 target/riscv/insn_trans/trans_rvv.c.inc | 50 - target/

Re: [RFC 1/1 v2] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-20 Thread Max Chou
+for (int i = 0; i < size; i += 16) { +addr = get_address(s, rs1, i); +if (is_load) { +tcg_gen_qemu_ld_i128(t16, addr, s->mem_idx, +MO_LE | MO_128 | atomicity); +tcg_gen_st_i128(t16, tcg_env, vreg_ofs(s, vd) +

Re: [PATCH v8 1/2] target/riscv: rvv: fix typo in vext continuous ldst function names

2024-12-18 Thread Max Chou
Reviewed-by: Max Chou max On 2024/12/18 10:23 PM, Craig Blackmore wrote: Replace `continus` with `continuous`. Signed-off-by: Craig Blackmore --- target/riscv/vector_helper.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/target/riscv/vector_helper.c b

Re: [PATCH v7 2/2] target/riscv: rvv: speed up small unit-stride loads and stores

2024-12-11 Thread Max Chou
On 2024/12/11 8:51 PM, Craig Blackmore wrote: Calling `vext_continuous_ldst_tlb` for load/stores smaller than 12 bytes significantly improves performance. Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini Co-authored-by: Craig Blackmore Signed-off-by: Helene CHELIN Signed-off-by: P

Re: [PATCH v6 1/1] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-12-04 Thread Max Chou
Hi Craig, I think that the unexpected vstart issue persists in this patchset. This version is unable to update the vstart CSR to the correct index when grouping load/store elements. For instance, if an exception is raised by an element following the first one, and the optimization attempts to gr

Re: [RFC v5 1/1] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-11-14 Thread Max Chou
On 2024/11/11 9:03 PM, Paolo Savini wrote: This patch improves the performance of the emulation of the RVV unit-stride loads and stores in the following cases: - when the data being loaded/stored per iteration amounts to 8 bytes or less. - when the vector length is 16 bytes (VLEN=128) and the

Re: [PATCH v6 0/7] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-10-15 Thread Max Chou
ping. On 2024/9/19 1:14 AM, Max Chou wrote: Hi, This version fixes several issues in v5 - The cross page bound checking issue - The mismatch vl comparison in the early exit checking of vext_ldst_us - The endian issue when host is big endian Thank for Richard Henderson's suggestions that

[PATCH v6 4/7] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store

2024-09-18 Thread Max Chou
agnostic, so remove the vstart early exit checking. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 129 +++ 1 file changed, 70 insertions(+), 59 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index c2fcf8b3a00

[PATCH v6 2/7] target/riscv: rvv: Replace VSTART_CHECK_EARLY_EXIT in vext_ldst_us

2024-09-18 Thread Max Chou
Because the real vl (evl) of vext_ldst_us may be different (e.g. vlm.v/vsm.v/etc.), so the VSTART_CHECK_EARLY_EXIT checking function should be replaced by checking evl in vext_ldst_us. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 5 - 1 file changed, 4 insertions(+), 1

[PATCH v6 6/7] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-09-18 Thread Max Chou
The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load/store to load/store more data at a time. Signed-off-by: Max Chou --- target/riscv

[PATCH v6 3/7] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-09-18 Thread Max Chou
the element load/store through the original softmmu flow and the direct access host memory flow. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 363 +-- 1 file changed, 224 insertions(+), 139 deletions(-) diff --git a/target/riscv/vector_helper.c b/ta

[PATCH v6 5/7] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride load-only-first load instructions

2024-09-18 Thread Max Chou
The unmasked unit-stride fault-only-first load instructions are similar to the unmasked unit-stride load/store instructions that is suitable to be optimized by using a direct access to host ram fast path. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 98

[PATCH v6 7/7] target/riscv: Inline unit-stride ld/st and corresponding functions for performance

2024-09-18 Thread Max Chou
In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou Reviewed-by: Ric

[PATCH v6 0/7] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-09-18 Thread Max Chou
v2: https://lore.kernel.org/all/20240531174504.281461-1-max.c...@sifive.com/ - v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.c...@sifive.com/ - v4: https://lore.kernel.org/all/20240613175122.1299212-1-max.c...@sifive.com/ - v5: https://lore.kernel.org/all/20240717133936.713642-1-max.c...@sifive.

[PATCH v6 1/7] target/riscv: Set vdata.vm field for vector load/store whole register instructions

2024-09-18 Thread Max Chou
The vm field of the vector load/store whole register instruction's encoding is 1. The helper function of the vector load/store whole register instructions may need the vdata.vm field to do some optimizations. Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 3 +++ 1

Re: [RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-07-30 Thread Max Chou
On 2024/7/25 2:04 PM, Richard Henderson wrote: On 7/17/24 23:39, Max Chou wrote: +static inline QEMU_ALWAYS_INLINE void +vext_continus_ldst_host(CPURISCVState *env, vext_ldst_elem_fn_host *ldst_host, +    void *vd, uint32_t evl, uint32_t reg_start, void *host

Re: [RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-07-30 Thread Max Chou
On 2024/7/25 1:51 PM, Richard Henderson wrote: On 7/17/24 23:39, Max Chou wrote: @@ -199,7 +212,7 @@ static void   vext_ldst_stride(void *vd, void *v0, target_ulong base,    target_ulong stride, CPURISCVState *env,    uint32_t desc, uint32_t vm

Re: [RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance

2024-07-30 Thread Max Chou
On 2024/7/25 2:05 PM, Richard Henderson wrote: On 7/17/24 23:39, Max Chou wrote: In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve

Re: [PATCH v3 12/12] target/riscv: Simplify probing in vext_ldff

2024-07-22 Thread Max Chou
Reviewed-by: Max Chou On 2024/7/19 9:07 AM, Richard Henderson wrote: The current pairing of tlb_vaddr_to_host with extra is either inefficient (user-only, with page_check_range) or incorrect (system, with probe_pages). For proper non-fault behaviour, use probe_access_flags with its nonfault

[RFC PATCH v5 1/5] target/riscv: Set vdata.vm field for vector load/store whole register instructions

2024-07-17 Thread Max Chou
The vm field of the vector load/store whole register instruction's encoding is 1. The helper function of the vector load/store whole register instructions may need the vdata.vm field to do some optimizations. Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 3 +++ 1

[RFC PATCH v5 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance

2024-07-17 Thread Max Chou
In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou Reviewed-by: Ric

[RFC PATCH v5 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-07-17 Thread Max Chou
The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load/store to load/store more data at a time. Signed-off-by: Max Chou --- target/riscv

[RFC PATCH v5 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-07-17 Thread Max Chou
el.org/all/20240531174504.281461-1-max.c...@sifive.com/ - v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.c...@sifive.com/ - v4: https://lore.kernel.org/all/20240613175122.1299212-1-max.c...@sifive.com/ Max Chou (5): target/riscv: Set vdata.vm field for vector load/store whol

[RFC PATCH v5 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store

2024-07-17 Thread Max Chou
agnostic, so remove the vstart early exit checking. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 123 +-- 1 file changed, 61 insertions(+), 62 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 23396a1b750

[RFC PATCH v5 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-07-17 Thread Max Chou
the element load/store through the original softmmu flow and the direct access host memory flow. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 361 +-- 1 file changed, 220 insertions(+), 141 deletions(-) diff --git a/target/riscv/vector_helper.c b/ta

Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff

2024-07-15 Thread Max Chou
On 2024/7/10 11:28 AM, Richard Henderson wrote: The current pairing of tlb_vaddr_to_host with extra is either inefficient (user-only, with page_check_range) or incorrect (system, with probe_pages). For proper non-fault behaviour, use probe_access_flags with its nonfault parameter set to true. S

Re: [RFC PATCH v4 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-06-25 Thread Max Chou
On 2024/6/20 12:29 PM, Richard Henderson wrote: On 6/13/24 10:51, Max Chou wrote: This commit references the sve_ldN_r/sve_stN_r helper functions in ARM target to optimize the vector unmasked unit-stride load/store instructions by following items: * Get the loose bound of activate elements

Re: [RFC PATCH v4 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-06-23 Thread Max Chou
On 2024/6/20 12:38 PM, Richard Henderson wrote: On 6/13/24 10:51, Max Chou wrote: The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load

[RFC PATCH v4 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-06-13 Thread Max Chou
The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load/store to load/store more data at a time. Signed-off-by: Max Chou --- target/riscv

[RFC PATCH v4 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb

2024-06-13 Thread Max Chou
If there are not any QEMU plugin memory callback functions, checking before calling the qemu_plugin_vcpu_mem_cb function can reduce the function call overhead. Signed-off-by: Max Chou --- accel/tcg/ldst_common.c.inc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a

[RFC PATCH v4 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-06-13 Thread Max Chou
new interface to direct access host memory The original element load/store interface is replaced by the new element load/store functions with _tlb & _host postfix that means doing the element load/store through the original softmmu flow and the direct access host memory flow. Signed-off-by:

[RFC PATCH v4 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance

2024-06-13 Thread Max Chou
In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou --- target/r

[RFC PATCH v4 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-06-13 Thread Max Chou
.1276105-1-max.c...@sifive.com/ Max Chou (5): accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store target/riscv: rvv: Provide a fast path using direct access t

[RFC PATCH v4 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store

2024-06-13 Thread Max Chou
The vector unit-stride whole register load/store instructions are similar to unmasked unit-stride load/store instructions that is suitable to be optimized by using a direct access to host ram fast path. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 185

[RFC PATCH v3 4/5] target/riscv: rvv: Provide group continuous ld/st flow for unit-stride ld/st instructions

2024-06-13 Thread Max Chou
The vector unmasked unit-stride and whole register load/store instructions will load/store continuous memory. If the endian of both the host and guest architecture are the same, then we can group the element load/store to load/store more data at a time. Signed-off-by: Max Chou --- target/riscv

[RFC PATCH v3 1/5] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb

2024-06-13 Thread Max Chou
If there are not any QEMU plugin memory callback functions, checking before calling the qemu_plugin_vcpu_mem_cb function can reduce the function call overhead. Signed-off-by: Max Chou --- accel/tcg/ldst_common.c.inc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a

[RFC PATCH v3 0/5] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-06-13 Thread Max Chou
ore vector ld/st functions Previous version: - v1: https://lore.kernel.org/all/20240215192823.729209-1-max.c...@sifive.com/ - v2: https://lore.kernel.org/all/20240531174504.281461-1-max.c...@sifive.com/ Max Chou (5): accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb ta

[RFC PATCH v3 3/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unit-stride whole register load/store

2024-06-13 Thread Max Chou
The vector unit-stride whole register load/store instructions are similar to unmasked unit-stride load/store instructions that is suitable to be optimized by using a direct access to host ram fast path. Signed-off-by: Max Chou --- target/riscv/vector_helper.c | 185

[RFC PATCH v3 2/5] target/riscv: rvv: Provide a fast path using direct access to host ram for unmasked unit-stride load/store

2024-06-13 Thread Max Chou
new interface to direct access host memory The original element load/store interface is replaced by the new element load/store functions with _tlb & _host postfix that means doing the element load/store through the original softmmu flow and the direct access host memory flow. Signed-off-by:

[RFC PATCH v3 5/5] target/riscv: Inline unit-stride ld/st and corresponding functions for performance

2024-06-13 Thread Max Chou
In the vector unit-stride load/store helper functions. the vext_ldst_us & vext_ldst_whole functions corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou --- target/r

Re: [RFC PATCH v2 5/6] target/riscv: rvv: Optimize v[l|s]e8.v with limitations

2024-06-03 Thread Max Chou
ions that suggested in tcg-op doc). I will provide next version with the helper function implementation like sve_ldN_r in ARM target. Thank you, Max On 2024/6/3 1:45 AM, Richard Henderson wrote: On 5/31/24 12:44, Max Chou wrote: The vector unit-stride load/store instructions (e.g. vle8.v/vs

[RFC PATCH v2 6/6] target/riscv: rvv: Optimize vl8re8.v/vs8r.v with limitations

2024-05-31 Thread Max Chou
endian Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 196 +++- 1 file changed, 194 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index bbac73bb12b..44763ccec06 100644 --- a

[RFC PATCH v2 1/6] target/riscv: Separate vector segment ld/st instructions

2024-05-31 Thread Max Chou
This commit separate the helper function implementations of vector segment load/store instructions from other vector load/store instructions. This can improve performance by avoiding unnecessary segment operation when NF = 1. Signed-off-by: Max Chou --- target/riscv/helper.h

[RFC PATCH v2 4/6] target/riscv: Add check_probe_[read|write] helper functions

2024-05-31 Thread Max Chou
The helper_check_probe_[read|write] functions wrap the probe_pages function to perform virtual address resolution for continuous vector load/store instructions. Signed-off-by: Max Chou --- target/riscv/helper.h| 4 target/riscv/vector_helper.c | 12 2 files changed

[RFC PATCH v2 5/6] target/riscv: rvv: Optimize v[l|s]e8.v with limitations

2024-05-31 Thread Max Chou
* Without mask * Without tail agnostic * Both host and target are little endian Signed-off-by: Max Chou --- target/riscv/insn_trans/trans_rvv.c.inc | 197 +++- 1 file changed, 195 insertions(+), 2 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv

[RFC PATCH v2 3/6] target/riscv: Inline vext_ldst_us and corresponding function for performance

2024-05-31 Thread Max Chou
In the vector unit-stride load/store helper functions. the vext_ldst_us function corresponding most of the execution time. Inline the functions can avoid the function call overhead to improve the helper function performance. Signed-off-by: Max Chou Reviewed-by: Richard Henderson --- target

[RFC PATCH v2 0/6] Improve the performance of RISC-V vector unit-stride/whole register ld/st instructions

2024-05-31 Thread Max Chou
QEMU user mode. PS: This RFC patch set only focuses on the vle8.v/vse8.v/vl8re8.v/vs8r.v instructions. The next version will try to complete other instructions. Series based on riscv-to-apply.next branch (commit 1806da7). Max Chou (6): target/riscv: Separate vector segment ld/st instructions

[RFC PATCH v2 2/6] accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb

2024-05-31 Thread Max Chou
If there are not any QEMU plugin memory callback functions, checking before calling the qemu_plugin_vcpu_mem_cb function can reduce the function call overhead. Signed-off-by: Max Chou --- accel/tcg/ldst_common.c.inc | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a

Re: [PATCH RESEND] target/riscv/cpu.c: fix Zvkb extension config

2024-05-11 Thread Max Chou
Reviewed-by: Max Chou Max On 2024/5/11 7:26 PM, Yangyu Chen wrote: This code has a typo that writes zvkb to zvkg, causing users can't enable zvkb through the config. This patch gets this fixed. Signed-off-by: Yangyu Chen Fixes: ea61ef7097d0 ("target/riscv: Move vector crypto ext

[PATCH v2 0/4] Fix fp16 checking in vector fp widen/narrow instructions

2024-03-22 Thread Max Chou
[PATCH] target/riscv: rvv: Check single width operator for vector fp widen instructions [PATCH] target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w [PATCH] target/riscv: rvv: Remove redudant SEW checking for vector fp narrow/widen instructions Max Chou (4

[PATCH v2 3/4] target/riscv: rvv: Check single width operator for vfncvt.rod.f.f.w

2024-03-22 Thread Max Chou
The opfv_narrow_check needs to check the single width float operator by require_rvf. Signed-off-by: Max Chou Reviewed-by: Daniel Henrique Barboza --- target/riscv/insn_trans/trans_rvv.c.inc | 1 + 1 file changed, 1 insertion(+) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target

[PATCH v2 2/4] target/riscv: rvv: Check single width operator for vector fp widen instructions

2024-03-22 Thread Max Chou
width float, so the opfxv_widen_check function doesn’t need require_rvf for the single width operator(integer). Signed-off-by: Max Chou Reviewed-by: Daniel Henrique Barboza --- target/riscv/insn_trans/trans_rvv.c.inc | 5 + 1 file changed, 5 insertions(+) diff --git a/target/riscv

[PATCH v2 1/4] target/riscv: rvv: Fix Zvfhmin checking for vfwcvt.f.f.v and vfncvt.f.f.w instructions

2024-03-22 Thread Max Chou
According v spec 18.4, only the vfwcvt.f.f.v and vfncvt.f.f.w instructions will be affected by Zvfhmin extension. And the vfwcvt.f.f.v and vfncvt.f.f.w instructions only support the conversions of * From 1*SEW(16/32) to 2*SEW(32/64) * From 2*SEW(32/64) to 1*SEW(16/32) Signed-off-by: Max Chou

[PATCH v2 4/4] target/riscv: rvv: Remove redudant SEW checking for vector fp narrow/widen instructions

2024-03-22 Thread Max Chou
If the checking functions check both the single and double width operators at the same time, then the single width operator checking functions (require_rvf[min]) will check whether the SEW is 8. Signed-off-by: Max Chou Reviewed-by: Daniel Henrique Barboza --- target/riscv/insn_trans

Re: [PATCH] Fix fp16 checking in vector fp widen/narrow instructions

2024-03-22 Thread Max Chou
Thanks for the notification. I'll resend this series and rebase on the riscv-to-apply.next branch. Max On 2024/3/22 12:12 PM, Alistair Francis wrote: On Wed, Mar 20, 2024 at 5:28 PM Max Chou wrote: When SEW is 16, we need to check whether the Zvfhmin is enabled for the single width ope

[PATCH] target/riscv: rvv: Remove the dependency of Zvfbfmin to Zfbfmin

2024-03-21 Thread Max Chou
According to the Zvfbfmin definition in the RISC-V BF16 extensions spec, the Zvfbfmin extension only requires either the V extension or the Zve32f extension. Signed-off-by: Max Chou --- target/riscv/tcg/tcg-cpu.c | 5 - 1 file changed, 5 deletions(-) diff --git a/target/riscv/tcg/tcg-cpu.c

  1   2   3   >