ed, Mar 12, 2025 at 03:55:47PM +0000, Paolo Savini wrote:
This commit improves the performance of QEMU when emulating strided
vector
loads and stores by substituting the call for the helper function
with the
generation of equivalent TCG operations.
Signed-off-by: Paolo Savini
Reviewed-by: Danie
can update vstart correctly.
We also use the helper function when it performs better than tcg for specific
combinations of vector length, number of fields and element size.
Signed-off-by: Paolo Savini
Reviewed-by: Daniel Henrique Barboza
Reviewed-by: Richard Handerson
Reviewed-by: Max Chou
elene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: use tcg ops generation to emulate whole reg rvv
loads/stores.
target/riscv/insn_trans/trans_rvv.c.inc | 155 +---
1 file changed, 108 insertions(+
direct call
to probe_access_flags.
Signed-off-by: Paolo Savini
Reviewed-by: Daniel Henrique Barboza
---
target/riscv/vector_helper.c | 57 +++-
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
[RISC-V/RVV] Expand the probe_pages helper function to handle probe
flags.
target/riscv/vector_helper.c | 57 +++-
1 file changed
flag that is
not a watchpoint flag (that per standard is allowed by this instruction) we
proceed with the per element probing to find the index of the element causing
the exception and set vl to such index.
Signed-off-by: Paolo Savini
Reviewed-by: Daniel Henrique Barboza
---
target/riscv
Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: optimize the memory probing for vector fault-only-first
loads
: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
[RISC-V/RVV] Generate strided vector loads/stores with tcg nodes.
target/riscv/insn_trans/trans_rvv.c.inc | 323
1 file changed, 273 insertions(+), 50 deletions(-)
--
2.34.1
This commit improves the performance of QEMU when emulating strided vector
loads and stores by substituting the call for the helper function with the
generation of equivalent TCG operations.
Signed-off-by: Paolo Savini
Reviewed-by: Daniel Henrique Barboza
---
target/riscv/insn_trans
t the
flags.
Cc: Richard Handerson
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
[RISC-V/RVV] Expand the probe_pages h
direct call
to probe_access_flags.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 57 +++-
1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 772cff8fbe..c0f1b7994e 100644
flag that is
not a watchpoint flag (that per standard is allowed by this instruction) we
proceed with the per element probing to find the index of the element causing
the exception and set vl to such index.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 103
change the heading from RFC to PATCH.
I also take the opportunity to thanks Daniel Barboza for the review.
Cc: Richard Handerson
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
[RISC-V/RVV] Generate strided vector loads/stores with tcg nodes.
target/riscv/insn_trans/trans_rvv.c.inc | 294
1 file changed, 244
This commit improves the performance of QEMU when emulating strided vector
loads and stores by substituting the call for the helper function with the
generation of equivalend TCG operations.
Signed-off-by: Paolo Savini
---
target/riscv/insn_trans/trans_rvv.c.inc | 294
: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: optimize the memory probing for vector fault-only-first
loads.
ta
flag that is
not a watchpoint flag (that per standard is allowed by this instruction) we
proceed with the per element probing to find the index of the element causing
the exception and set vl to such index.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 91
Hi Alex,
thanks for the review!
On 1/22/25 17:43, Alex Bennée wrote:
Paolo Savini writes:
This patch replaces the use of a helper function with direct tcg ops generation
in order to emulate whole register loads and stores. This is done in order to
improve the performance of QEMU.
Generally
ur of the patch.
Cc: Richard Handerson
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: use tcg ops genera
can update vstart correctly.
We also use the helper function when it performs better than tcg for specific
combinations of vector length, number of fields and element size.
Signed-off-by: Paolo Savini
---
target/riscv/insn_trans/trans_rvv.c.inc | 164 +---
1 file changed, 119
w and explanations.
Cc: Richard Handerson
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: use tcg ops generati
register load or store.
Signed-off-by: Paolo Savini
---
target/riscv/insn_trans/trans_rvv.c.inc | 125 +++-
1 file changed, 78 insertions(+), 47 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv/insn_trans/trans_rvv.c.inc
index b9883a5d32..c2c2c53254
almer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: use tcg ops generation to emulate whole reg rvv
loads/s
register load or store.
Signed-off-by: Paolo Savini
---
target/riscv/insn_trans/trans_rvv.c.inc | 125 +++-
1 file changed, 78 insertions(+), 47 deletions(-)
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc
b/target/riscv/insn_trans/trans_rvv.c.inc
index b9883a5d32..c2c2c53254
This patch aims at emulating the whole register loads and stores through
direct generation of tcg operations rather than through the aid of a helper
function.
Signed-off-by: Paolo Savini
---
target/riscv/insn_trans/trans_rvv.c.inc | 104 +---
1 file changed, 56 insertions
athan Egge
Cc: Max Chou
Cc: Jeremy Bennett
Cc: Craig Blackmore
Paolo Savini (1):
target/riscv: use tcg ops generation to emulate whole reg rvv
loads/stores.
target/riscv/insn_trans/trans_rvv.c.inc | 104 +---
1 file changed, 56 insertions(+), 48 deletions(-)
--
2.34.1
/qemu/blob/134b443512825bed401b6e141447b8cdc22d2efe/target/riscv/vector_helper.c#L224
Thanks
Paolo
On 11/8/24 09:11, Richard Henderson wrote:
On 11/7/24 12:58, Daniel Henrique Barboza wrote:
On 11/4/24 9:48 AM, Richard Henderson wrote:
On 10/30/24 15:25, Paolo Savini wrote:
On 10/30/24 11:4
The version 5 of the patch set splits the patches into independent submissions
so to simplify the review process.
Previous versions:
- v1:
https://lore.kernel.org/all/20240717153040.11073-1-paolo.sav...@embecosm.com/
- v2:
https://lore.kernel.org/all/20241002135708.99146-1-paolo.sav...@embecosm.
sters (LMUL=1).
The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
Signed-off-by: Helene C
Thanks for the review Richard.
On 10/30/24 11:40, Richard Henderson wrote:
On 10/29/24 19:43, Paolo Savini wrote:
This patch optimizes the emulation of unit-stride load/store RVV
instructions
when the data being loaded/stored per iteration amounts to 16 bytes
or more.
The optimization
Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Helene CHELIN (1):
target/riscv: rvv: reduce the overhead for simple RISC-V vector
unit-stride loads and stores
Paolo Savini (1):
target/riscv: rvv: improve performance of RISC-V vector loads and
stores on
and the
destination memory address and vice versa.
This is done only if we have direct access to the RAM of the host machine,
if the host is little endiand and if it supports atomic 128 bit memory
operations.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c| 17
sters (LMUL=1).
The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
Signed-off-by: Helene C
f the
vector registers (LMUL=1).
The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
S
register and the
destination memory address and vice versa.
This is done only if we have direct access to the RAM of the host machine,
if the host is little endiand and if it supports atomic 128 bit memory
operations.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 14 +-
1
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Helene CHELIN (1):
target/riscv: rvv: reduce the overhead for simple RISC-V vector
unit-stride loads and stores
Paolo Savini
The simplified emulation of vector loads and stores that bypasses the memory
probing in the vext_ldst_us helper function seem to benefit only the user mode.
We therefore limit this approach to the user mode configuration.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 3 ++-
1
f the
vector registers (LMUL=1).
The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
S
Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Helene CHELIN (1):
target/riscv: rvv: reduce the overhead for simple RISC-V vector
unit-stride loads and stores
Paolo Savini (1):
target/riscv: use a simplified loop to emulate
The simplified emulation of vector loads and stores that bypasses the memory
probing in the vext_ldst_us helper function seem to benefit only the user mode.
We therefore limit this approach to the user mode configuration.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 3 ++-
1
load/store loop for small
vector and data sizes when QEMU is in system mode.
Cc: Richard Handerson
Cc: Palmer Dabbelt
Cc: Alistair Francis
Cc: Bin Meng
Cc: Weiwei Li
Cc: Daniel Henrique Barboza
Cc: Liu Zhiwei
Cc: Helene Chelin
Cc: Nathan Egge
Cc: Max Chou
Paolo Savini (1):
target/riscv
Thanks for the feedback Richard, I'm working on the endianness. Could
you please give me more details about the atomicity issues you are
referring to?
Best wishes
Paolo
On 7/27/24 08:15, Richard Henderson wrote:
On 7/18/24 01:30, Paolo Savini wrote:
This patch optimizes the emulati
f the
vector registers (LMUL=1).
The optimization consists of avoiding the overhead of probing the RAM of the
host machine and doing a loop load/store on the input data grouped in chunks
of as many bytes as possible (8,4,2,1 bytes).
Co-authored-by: Helene CHELIN
Co-authored-by: Paolo Savini
S
erhead for simple RISC-V vector
unit-stride loads and stores
Paolo Savini (1):
target/riscv: rvv: improve performance of RISC-V vector loads and
stores on large amounts of data.
target/riscv/vector_helper.c | 63 +++-
1 file changed, 62 insertions(+),
register and
the destination memory address and vice versa.
This is done only if we have direct access to the RAM of the host machine.
Signed-off-by: Paolo Savini
---
target/riscv/vector_helper.c | 17 -
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/target/riscv
45 matches
Mail list logo