On 6/27/25 10:21 AM, Max Chou wrote:
> > According to the RISC-V instruction set manual, the minimum VLEN needs
> > to respect the following extensions:
> >
> >Extension Minimum VLEN
> > * V 128
> > * Zve64[d|f|x] 64
> > * Zve32[f|x
Hi Nutty,
Thanks for the suggestion.
I'll provide a new version including the new description and a fix about
another EEWs issue.
Thanks,
Max
On Tue, Jul 1, 2025 at 2:43 PM Nutty Liu
wrote:
> On 6/27/2025 9:20 PM, Max Chou wrote:
> > From: Anton Blanchard
> >
>
This patchset is based on the v1 provided by Anoton Blanchard with
following update:
* Add the missing input EEWs checking rule for widen vector reduction
instruction.
Reference:
* v1: 20250415043207.3512209-1-ant...@tenstorrent.com
Anton Blanchard (3):
target/riscv: rvv: Apply vext_check_in
According to the V spec, the vector fault-only-first load instructions
may change the VL CSR.
So the ldff_trans TCG translation function should generate the
lookup_and_goto_ptr flow as the vsetvl/vsetvli translation function to
make sure the vl_eq_vlmax TB flag is correct.
Signed-off-by: Max Chou
According to the RISC-V instruction set manual, the minimum VLEN needs
to respect the following extensions:
Extension Minimum VLEN
* V 128
* Zve64[d|f|x] 64
* Zve32[f|x] 32
Signed-off-by: Max Chou
---
target/riscv/tcg/tcg-cpu.c | 13 +++--
1 file changed, 11
From: Anton Blanchard
We were marking vadc and vsbc as vm=1 instructions, which meant
vext_check_input_eew wouldn't detect mask vs source register
overlaps.
Signed-off-by: Anton Blanchard
Reviewed-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn32.decode | 10 +-
1
From: Anton Blanchard
Handle the overlap of source registers with different EEWs.
Signed-off-by: Anton Blanchard
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
From: Anton Blanchard
Handle the overlap of source registers with different EEWs.
Signed-off-by: Anton Blanchard
Reviewed-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 22 --
1 file changed, 12 insertions(+), 10 deletions(-)
diff
estions and review.
Anton Blanchard (2):
target/riscv: rvv: Source vector registers cannot overlap mask
register
target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS
Max Chou (8):
target/riscv: rvv: Apply vext_check_input_eew to vrgather instructions
to check mismatched i
From: Anton Blanchard
Add the relevant ISA paragraphs explaining why source (and destination)
registers cannot overlap the mask register.
Signed-off-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Reviewed-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b
Handle the overlap of source registers with different EEWs.
The vd of vector widening mul-add instructions is one of the input
operands.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvbf16.c.inc | 9 ++-
target
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
According to the v spec, the encodings of vcomoress.vm and vector
mask-register logical instructions with vm=0 are reserved.
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn32.decode | 18 +-
1 file changed, 9 insertions(+), 9 deletions
From: Anton Blanchard
Signed-off-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Reviewed-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 18 +-
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/target/riscv
According to the v spec, a vector register cannot be used to provide source
operands with more than one EEW for a single instruction.
The vs1 EEW of vrgatherei16.vv is 16.
Co-authored-by: Anton Blanchard
Reviewed-by: Daniel Henrique Barboza
Signed-off-by: Max Chou
---
target/riscv/insn_trans
On 2025/4/5 5:17 PM, Daniel Henrique Barboza wrote:
On 3/29/25 11:44 AM, Max Chou wrote:
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1
On 2025/4/5 5:14 PM, Daniel Henrique Barboza wrote:
On 3/29/25 11:44 AM, Max Chou wrote:
Handle the overlap of source registers with different EEWs.
The vs1 EEW of vrgatherei16.vv is 16.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Since you're marked as Author you don
On 2025/4/5 5:09 PM, Daniel Henrique Barboza wrote:
On 3/29/25 11:44 AM, Max Chou wrote:
According to the v spec, a vector register cannot be used to provide
source
operands with more than one EEW for a single instruction.
Signed-off-by: Max Chou
---
target/riscv/insn_trans
According to the v spec, a vector register cannot be used to provide source
operands with more than one EEW for a single instruction.
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 29 +
1 file changed, 29 insertions(+)
diff --git a/target/riscv
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
From: Anton Blanchard
Add the relevant ISA paragraphs explaining why source (and destination)
registers cannot overlap the mask register.
Signed-off-by: Anton Blanchard
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 29 ++---
1 file changed, 26
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
The vd of vector widening mul-add instructions is one of the input
operands.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvbf16.c.inc | 9 ++-
target/riscv
Handle the overlap of source registers with different EEWs.
The vs1 EEW of vrgatherei16.vv is 16.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 +++
1 file changed, 3 insertions(+)
diff --git a/target/riscv
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv
According to the v spec, the encodings of vcomoress.vm and vector
mask-register logical instructions with vm=0 are reserved.
Signed-off-by: Max Chou
---
target/riscv/insn32.decode | 18 +-
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/target/riscv/insn32.decode b
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 6 --
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/target/riscv/insn_trans
andling of register overlaps in vector widening/narrowing
instructions
4. Fix unmasked RVV instruction encoding (e.g. vcompress.vm)
Anton Blanchard (2):
target/riscv: rvv: Source vector registers cannot overlap mask
register
target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS
Ma
From: Anton Blanchard
Signed-off-by: Anton Blanchard
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 18 +-
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv/insn_trans/trans_rvv.c.inc
Handle the overlap of source registers with different EEWs.
Co-authored-by: Anton Blanchard
Co-authored-by: Max Chou
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/target/riscv/insn_trans
Hi Anton,
I hope you’re doing well.
While reviewing this patchset, I noticed a few missing parts related to
the mismatched input EEWs encoding constraint.
I also found a few other rvv encoding issues and planned to submit an
upstream patchset to address them.
However, I think it would be bette
Reviewed-by: Max Chou
On 2025/1/26 3:20 PM, Anton Blanchard wrote:
Signed-off-by: Anton Blanchard
---
target/riscv/insn_trans/trans_rvv.c.inc | 18 +-
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv
Hi Anton,
This patch violates some coding style rules of QEMU.
You can verify the coding style by running the checkpatch.pl script in
the QEMU repository.
(ref:
https://www.qemu.org/docs/master/devel/submitting-a-patch.html#use-the-qemu-coding-style)
The patch 12 also has the same issue.
Than
Hi Anton,
The vext_check_slide function affects the
vslide[up|down].v[x|i]/vfslide1[up|down].vf/vslide1[up|down].vx
instructions than the vslide1down.vx instruction alone.
Therefore, it would be more appropriate to update the commit message to
provide a clearer information.
(PS:perhaps, using
Hi Anton,
I think that the commit message could be improved for better clarity.
The vext_check_ss function affects more RVV instructions than the
vadd.vx instruction alone.
(PS:perhaps using the category (OPIVX/OPFVF/etc.) to describe the
affected RVV instructions would be more helpful.)
Addit
Hi Anton,
You might need to extend this patch or provide a new patch to handle
the different EEWs source operands checking for the vrgatherei16.vv
instruction (when SEW is not 16).
Thanks,
Max
On 2025/1/26 3:20 PM, Anton Blanchard wrote:
Signed-off-by: Anton Blanchard
---
target/riscv/insn
Reviewed-by: Max Chou
On 2025/1/26 3:20 PM, Anton Blanchard wrote:
Add the relevant ISA paragraphs explaining why source (and destination)
registers cannot overlap the mask register.
Signed-off-by: Anton Blanchard
---
target/riscv/insn_trans/trans_rvv.c.inc | 29
According to the Vector Reduction Operations section in the RISC-V "V"
Vector Extension spec,
"If vl=0, no operation is performed and the destination register is not
updated."
The vd should be updated when vl is larger than 0.
Signed-off-by: Max Chou
---
target/riscv
In prop_vlen_set function, there is an incorrect comparison between
vlen(bit) and vlenb(byte).
This will cause unexpected error when user applies the `vlen=1024` cpu
option with a vendor predefined cpu type that the default vlen is
1024(vlenb=128).
Signed-off-by: Max Chou
---
target/riscv/cpu.c
value
to 64 bits during the TCG translation phase to ensure that the helper
functions won't lost the higer 32 bits.
Signed-off-by: Max Chou
---
target/riscv/helper.h | 16
target/riscv/insn_trans/trans_rvv.c.inc | 50 -
target/
+for (int i = 0; i < size; i += 16) {
+addr = get_address(s, rs1, i);
+if (is_load) {
+tcg_gen_qemu_ld_i128(t16, addr, s->mem_idx,
+MO_LE | MO_128 | atomicity);
+tcg_gen_st_i128(t16, tcg_env, vreg_ofs(s, vd) +
Reviewed-by: Max Chou
max
On 2024/12/18 10:23 PM, Craig Blackmore wrote:
Replace `continus` with `continuous`.
Signed-off-by: Craig Blackmore
---
target/riscv/vector_helper.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/target/riscv/vector_helper.c b
On 2024/12/11 8:51 PM, Craig Blackmore wrote:
Calling `vext_continuous_ldst_tlb` for load/stores smaller than 12 bytes
significantly improves performance.
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
Co-authored-by: Craig Blackmore
Signed-off-by: Helene CHELIN
Signed-off-by: P
Hi Craig,
I think that the unexpected vstart issue persists in this patchset.
This version is unable to update the vstart CSR to the correct index when
grouping load/store elements.
For instance, if an exception is raised by an element following the first
one, and the optimization attempts to gr
On 2024/11/11 9:03 PM, Paolo Savini wrote:
This patch improves the performance of the emulation of the RVV unit-stride
loads and stores in the following cases:
- when the data being loaded/stored per iteration amounts to 8 bytes or less.
- when the vector length is 16 bytes (VLEN=128) and the
ping.
On 2024/9/19 1:14 AM, Max Chou wrote:
Hi,
This version fixes several issues in v5
- The cross page bound checking issue
- The mismatch vl comparison in the early exit checking of vext_ldst_us
- The endian issue when host is big endian
Thank for Richard Henderson's suggestions that
agnostic, so remove the vstart early exit checking.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 129 +++
1 file changed, 70 insertions(+), 59 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index c2fcf8b3a00
Because the real vl (evl) of vext_ldst_us may be different (e.g.
vlm.v/vsm.v/etc.), so the VSTART_CHECK_EARLY_EXIT checking function
should be replaced by checking evl in vext_ldst_us.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 5 -
1 file changed, 4 insertions(+), 1
The vector unmasked unit-stride and whole register load/store
instructions will load/store continuous memory. If the endian of both
the host and guest architecture are the same, then we can group the
element load/store to load/store more data at a time.
Signed-off-by: Max Chou
---
target/riscv
the
element load/store through the original softmmu flow and the direct
access host memory flow.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 363 +--
1 file changed, 224 insertions(+), 139 deletions(-)
diff --git a/target/riscv/vector_helper.c b/ta
The unmasked unit-stride fault-only-first load instructions are similar
to the unmasked unit-stride load/store instructions that is suitable to
be optimized by using a direct access to host ram fast path.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 98
In the vector unit-stride load/store helper functions. the vext_ldst_us
& vext_ldst_whole functions corresponding most of the execution time.
Inline the functions can avoid the function call overhead to improve the
helper function performance.
Signed-off-by: Max Chou
Reviewed-by: Ric
v2: https://lore.kernel.org/all/20240531174504.281461-1-max.c...@sifive.com/
- v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.c...@sifive.com/
- v4: https://lore.kernel.org/all/20240613175122.1299212-1-max.c...@sifive.com/
- v5: https://lore.kernel.org/all/20240717133936.713642-1-max.c...@sifive.
The vm field of the vector load/store whole register instruction's
encoding is 1.
The helper function of the vector load/store whole register instructions
may need the vdata.vm field to do some optimizations.
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 +++
1
On 2024/7/25 2:04 PM, Richard Henderson wrote:
On 7/17/24 23:39, Max Chou wrote:
+static inline QEMU_ALWAYS_INLINE void
+vext_continus_ldst_host(CPURISCVState *env, vext_ldst_elem_fn_host
*ldst_host,
+ void *vd, uint32_t evl, uint32_t reg_start,
void *host
On 2024/7/25 1:51 PM, Richard Henderson wrote:
On 7/17/24 23:39, Max Chou wrote:
@@ -199,7 +212,7 @@ static void
vext_ldst_stride(void *vd, void *v0, target_ulong base,
target_ulong stride, CPURISCVState *env,
uint32_t desc, uint32_t vm
On 2024/7/25 2:05 PM, Richard Henderson wrote:
On 7/17/24 23:39, Max Chou wrote:
In the vector unit-stride load/store helper functions. the vext_ldst_us
& vext_ldst_whole functions corresponding most of the execution time.
Inline the functions can avoid the function call overhead to improve
Reviewed-by: Max Chou
On 2024/7/19 9:07 AM, Richard Henderson wrote:
The current pairing of tlb_vaddr_to_host with extra is either
inefficient (user-only, with page_check_range) or incorrect
(system, with probe_pages).
For proper non-fault behaviour, use probe_access_flags with
its nonfault
The vm field of the vector load/store whole register instruction's
encoding is 1.
The helper function of the vector load/store whole register instructions
may need the vdata.vm field to do some optimizations.
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 3 +++
1
In the vector unit-stride load/store helper functions. the vext_ldst_us
& vext_ldst_whole functions corresponding most of the execution time.
Inline the functions can avoid the function call overhead to improve the
helper function performance.
Signed-off-by: Max Chou
Reviewed-by: Ric
The vector unmasked unit-stride and whole register load/store
instructions will load/store continuous memory. If the endian of both
the host and guest architecture are the same, then we can group the
element load/store to load/store more data at a time.
Signed-off-by: Max Chou
---
target/riscv
el.org/all/20240531174504.281461-1-max.c...@sifive.com/
- v3: https://lore.kernel.org/all/20240613141906.1276105-1-max.c...@sifive.com/
- v4: https://lore.kernel.org/all/20240613175122.1299212-1-max.c...@sifive.com/
Max Chou (5):
target/riscv: Set vdata.vm field for vector load/store whol
agnostic, so remove the vstart early exit checking.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 123 +--
1 file changed, 61 insertions(+), 62 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 23396a1b750
the
element load/store through the original softmmu flow and the direct
access host memory flow.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 361 +--
1 file changed, 220 insertions(+), 141 deletions(-)
diff --git a/target/riscv/vector_helper.c b/ta
On 2024/7/10 11:28 AM, Richard Henderson wrote:
The current pairing of tlb_vaddr_to_host with extra is either
inefficient (user-only, with page_check_range) or incorrect
(system, with probe_pages).
For proper non-fault behaviour, use probe_access_flags with
its nonfault parameter set to true.
S
On 2024/6/20 12:29 PM, Richard Henderson wrote:
On 6/13/24 10:51, Max Chou wrote:
This commit references the sve_ldN_r/sve_stN_r helper functions in ARM
target to optimize the vector unmasked unit-stride load/store
instructions by following items:
* Get the loose bound of activate elements
On 2024/6/20 12:38 PM, Richard Henderson wrote:
On 6/13/24 10:51, Max Chou wrote:
The vector unmasked unit-stride and whole register load/store
instructions will load/store continuous memory. If the endian of both
the host and guest architecture are the same, then we can group the
element load
The vector unmasked unit-stride and whole register load/store
instructions will load/store continuous memory. If the endian of both
the host and guest architecture are the same, then we can group the
element load/store to load/store more data at a time.
Signed-off-by: Max Chou
---
target/riscv
If there are not any QEMU plugin memory callback functions, checking
before calling the qemu_plugin_vcpu_mem_cb function can reduce the
function call overhead.
Signed-off-by: Max Chou
---
accel/tcg/ldst_common.c.inc | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a
new interface to direct access host memory
The original element load/store interface is replaced by the new element
load/store functions with _tlb & _host postfix that means doing the
element load/store through the original softmmu flow and the direct
access host memory flow.
Signed-off-by:
In the vector unit-stride load/store helper functions. the vext_ldst_us
& vext_ldst_whole functions corresponding most of the execution time.
Inline the functions can avoid the function call overhead to improve the
helper function performance.
Signed-off-by: Max Chou
---
target/r
.1276105-1-max.c...@sifive.com/
Max Chou (5):
accel/tcg: Avoid unnecessary call overhead from
qemu_plugin_vcpu_mem_cb
target/riscv: rvv: Provide a fast path using direct access to host ram
for unmasked unit-stride load/store
target/riscv: rvv: Provide a fast path using direct access t
The vector unit-stride whole register load/store instructions are
similar to unmasked unit-stride load/store instructions that is suitable
to be optimized by using a direct access to host ram fast path.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 185
The vector unmasked unit-stride and whole register load/store
instructions will load/store continuous memory. If the endian of both
the host and guest architecture are the same, then we can group the
element load/store to load/store more data at a time.
Signed-off-by: Max Chou
---
target/riscv
If there are not any QEMU plugin memory callback functions, checking
before calling the qemu_plugin_vcpu_mem_cb function can reduce the
function call overhead.
Signed-off-by: Max Chou
---
accel/tcg/ldst_common.c.inc | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a
ore vector ld/st functions
Previous version:
- v1: https://lore.kernel.org/all/20240215192823.729209-1-max.c...@sifive.com/
- v2: https://lore.kernel.org/all/20240531174504.281461-1-max.c...@sifive.com/
Max Chou (5):
accel/tcg: Avoid unnecessary call overhead from
qemu_plugin_vcpu_mem_cb
ta
The vector unit-stride whole register load/store instructions are
similar to unmasked unit-stride load/store instructions that is suitable
to be optimized by using a direct access to host ram fast path.
Signed-off-by: Max Chou
---
target/riscv/vector_helper.c | 185
new interface to direct access host memory
The original element load/store interface is replaced by the new element
load/store functions with _tlb & _host postfix that means doing the
element load/store through the original softmmu flow and the direct
access host memory flow.
Signed-off-by:
In the vector unit-stride load/store helper functions. the vext_ldst_us
& vext_ldst_whole functions corresponding most of the execution time.
Inline the functions can avoid the function call overhead to improve the
helper function performance.
Signed-off-by: Max Chou
---
target/r
ions that suggested in tcg-op doc).
I will provide next version with the helper function implementation like
sve_ldN_r in ARM target.
Thank you,
Max
On 2024/6/3 1:45 AM, Richard Henderson wrote:
On 5/31/24 12:44, Max Chou wrote:
The vector unit-stride load/store instructions (e.g. vle8.v/vs
endian
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 196 +++-
1 file changed, 194 insertions(+), 2 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv/insn_trans/trans_rvv.c.inc
index bbac73bb12b..44763ccec06 100644
--- a
This commit separate the helper function implementations of vector
segment load/store instructions from other vector load/store
instructions.
This can improve performance by avoiding unnecessary segment operation
when NF = 1.
Signed-off-by: Max Chou
---
target/riscv/helper.h
The helper_check_probe_[read|write] functions wrap the probe_pages
function to perform virtual address resolution for continuous vector
load/store instructions.
Signed-off-by: Max Chou
---
target/riscv/helper.h| 4
target/riscv/vector_helper.c | 12
2 files changed
* Without mask
* Without tail agnostic
* Both host and target are little endian
Signed-off-by: Max Chou
---
target/riscv/insn_trans/trans_rvv.c.inc | 197 +++-
1 file changed, 195 insertions(+), 2 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv
In the vector unit-stride load/store helper functions. the vext_ldst_us
function corresponding most of the execution time. Inline the functions
can avoid the function call overhead to improve the helper function
performance.
Signed-off-by: Max Chou
Reviewed-by: Richard Henderson
---
target
QEMU user mode.
PS: This RFC patch set only focuses on the vle8.v/vse8.v/vl8re8.v/vs8r.v
instructions. The next version will try to complete other instructions.
Series based on riscv-to-apply.next branch (commit 1806da7).
Max Chou (6):
target/riscv: Separate vector segment ld/st instructions
If there are not any QEMU plugin memory callback functions, checking
before calling the qemu_plugin_vcpu_mem_cb function can reduce the
function call overhead.
Signed-off-by: Max Chou
---
accel/tcg/ldst_common.c.inc | 8 ++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a
Reviewed-by: Max Chou
Max
On 2024/5/11 7:26 PM, Yangyu Chen wrote:
This code has a typo that writes zvkb to zvkg, causing users can't
enable zvkb through the config. This patch gets this fixed.
Signed-off-by: Yangyu Chen
Fixes: ea61ef7097d0 ("target/riscv: Move vector crypto ext
[PATCH] target/riscv: rvv: Check single width operator for vector fp
widen instructions
[PATCH] target/riscv: rvv: Check single width operator for
vfncvt.rod.f.f.w
[PATCH] target/riscv: rvv: Remove redudant SEW checking for vector fp
narrow/widen instructions
Max Chou (4
The opfv_narrow_check needs to check the single width float operator by
require_rvf.
Signed-off-by: Max Chou
Reviewed-by: Daniel Henrique Barboza
---
target/riscv/insn_trans/trans_rvv.c.inc | 1 +
1 file changed, 1 insertion(+)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target
width float, so the opfxv_widen_check function doesn’t
need require_rvf for the single width operator(integer).
Signed-off-by: Max Chou
Reviewed-by: Daniel Henrique Barboza
---
target/riscv/insn_trans/trans_rvv.c.inc | 5 +
1 file changed, 5 insertions(+)
diff --git a/target/riscv
According v spec 18.4, only the vfwcvt.f.f.v and vfncvt.f.f.w
instructions will be affected by Zvfhmin extension.
And the vfwcvt.f.f.v and vfncvt.f.f.w instructions only support the
conversions of
* From 1*SEW(16/32) to 2*SEW(32/64)
* From 2*SEW(32/64) to 1*SEW(16/32)
Signed-off-by: Max Chou
If the checking functions check both the single and double width
operators at the same time, then the single width operator checking
functions (require_rvf[min]) will check whether the SEW is 8.
Signed-off-by: Max Chou
Reviewed-by: Daniel Henrique Barboza
---
target/riscv/insn_trans
Thanks for the notification.
I'll resend this series and rebase on the riscv-to-apply.next branch.
Max
On 2024/3/22 12:12 PM, Alistair Francis wrote:
On Wed, Mar 20, 2024 at 5:28 PM Max Chou wrote:
When SEW is 16, we need to check whether the Zvfhmin is enabled for the
single width ope
According to the Zvfbfmin definition in the RISC-V BF16 extensions spec,
the Zvfbfmin extension only requires either the V extension or the
Zve32f extension.
Signed-off-by: Max Chou
---
target/riscv/tcg/tcg-cpu.c | 5 -
1 file changed, 5 deletions(-)
diff --git a/target/riscv/tcg/tcg-cpu.c
1 - 100 of 248 matches
Mail list logo