Re: [PATCH 1/1 v2] [RISC-V/RVV] Generate strided vector loads/stores with tcg nodes.

2025-07-10 Thread Paolo Savini
ed, Mar 12, 2025 at 03:55:47PM +0000, Paolo Savini wrote: This commit improves the performance of QEMU when emulating strided vector loads and stores by substituting the call for the helper function with the generation of equivalent TCG operations. Signed-off-by: Paolo Savini Reviewed-by: Danie

[PATCH 1/1 v4] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2025-03-13 Thread Paolo Savini
can update vstart correctly. We also use the helper function when it performs better than tcg for specific combinations of vector length, number of fields and element size. Signed-off-by: Paolo Savini Reviewed-by: Daniel Henrique Barboza Reviewed-by: Richard Handerson Reviewed-by: Max Chou

[PATCH 0/1 v4] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2025-03-13 Thread Paolo Savini
elene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores. target/riscv/insn_trans/trans_rvv.c.inc | 155 +--- 1 file changed, 108 insertions(+

[PATCH 1/1 v2] [RISC-V/RVV] Expand the probe_pages helper function to handle probe flags.

2025-03-13 Thread Paolo Savini
direct call to probe_access_flags. Signed-off-by: Paolo Savini Reviewed-by: Daniel Henrique Barboza --- target/riscv/vector_helper.c | 57 +++- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c

[PATCH 0/1 v2] [RISC-V/RVV] use a single function to probe memory.

2025-03-13 Thread Paolo Savini
Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): [RISC-V/RVV] Expand the probe_pages helper function to handle probe flags. target/riscv/vector_helper.c | 57 +++- 1 file changed

[PATCH 1/1 v3] target/riscv: optimize the memory probing for vector fault-only-first loads.

2025-03-12 Thread Paolo Savini
flag that is not a watchpoint flag (that per standard is allowed by this instruction) we proceed with the per element probing to find the index of the element causing the exception and set vl to such index. Signed-off-by: Paolo Savini Reviewed-by: Daniel Henrique Barboza --- target/riscv

[PATCH 0/1 v3] [RISC-V/RVV] optimize the memory probing for vector fault-only-first loads.

2025-03-12 Thread Paolo Savini
Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: optimize the memory probing for vector fault-only-first loads

[PATCH 0/1 v2] [RISCV/RVV] Generate strided vector loads/stores with tcg nodes.

2025-03-12 Thread Paolo Savini
: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): [RISC-V/RVV] Generate strided vector loads/stores with tcg nodes. target/riscv/insn_trans/trans_rvv.c.inc | 323 1 file changed, 273 insertions(+), 50 deletions(-) -- 2.34.1

[PATCH 1/1 v2] [RISC-V/RVV] Generate strided vector loads/stores with tcg nodes.

2025-03-12 Thread Paolo Savini
This commit improves the performance of QEMU when emulating strided vector loads and stores by substituting the call for the helper function with the generation of equivalent TCG operations. Signed-off-by: Paolo Savini Reviewed-by: Daniel Henrique Barboza --- target/riscv/insn_trans

[PATCH 0/1] [RISC-V/RVV] use a single function to probe memory.

2025-02-21 Thread Paolo Savini
t the flags. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): [RISC-V/RVV] Expand the probe_pages h

[PATCH 1/1] [RISC-V/RVV] Expand the probe_pages helper function to handle probe flags.

2025-02-21 Thread Paolo Savini
direct call to probe_access_flags. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 57 +++- 1 file changed, 37 insertions(+), 20 deletions(-) diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c index 772cff8fbe..c0f1b7994e 100644

[PATCH 1/1 V2] [RISC-V/RVV] optimize the memory probing for vector fault-only-first loads.

2025-02-21 Thread Paolo Savini
flag that is not a watchpoint flag (that per standard is allowed by this instruction) we proceed with the per element probing to find the index of the element causing the exception and set vl to such index. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 103

[PATCH 0/1 v2] [RISC-V/RVV] optimize the memory probing for vector fault-only-first loads.

2025-02-21 Thread Paolo Savini
change the heading from RFC to PATCH. I also take the opportunity to thanks Daniel Barboza for the review. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou

[PATCH 0/1] [RISCV/RVV] Generate strided vector loads/stores with tcg nodes.

2025-02-11 Thread Paolo Savini
Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): [RISC-V/RVV] Generate strided vector loads/stores with tcg nodes. target/riscv/insn_trans/trans_rvv.c.inc | 294 1 file changed, 244

[PATCH 1/1] [RISC-V/RVV] Generate strided vector loads/stores with tcg nodes.

2025-02-11 Thread Paolo Savini
This commit improves the performance of QEMU when emulating strided vector loads and stores by substituting the call for the helper function with the generation of equivalend TCG operations. Signed-off-by: Paolo Savini --- target/riscv/insn_trans/trans_rvv.c.inc | 294

[RFC 0/1 v1] target/riscv: optimize the memory probing for vector fault-only-first loads.

2025-01-29 Thread Paolo Savini
: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: optimize the memory probing for vector fault-only-first loads. ta

[RFC 1/1 v1] target/riscv: optimize the memory probing for vector fault-only-first loads.

2025-01-29 Thread Paolo Savini
flag that is not a watchpoint flag (that per standard is allowed by this instruction) we proceed with the per element probing to find the index of the element causing the exception and set vl to such index. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 91

Re: [RFC 1/1 v3] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2025-01-23 Thread Paolo Savini
Hi Alex, thanks for the review! On 1/22/25 17:43, Alex Bennée wrote: Paolo Savini writes: This patch replaces the use of a helper function with direct tcg ops generation in order to emulate whole register loads and stores. This is done in order to improve the performance of QEMU. Generally

[RFC 0/1 v3] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2025-01-22 Thread Paolo Savini
ur of the patch. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: use tcg ops genera

[RFC 1/1 v3] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2025-01-22 Thread Paolo Savini
can update vstart correctly. We also use the helper function when it performs better than tcg for specific combinations of vector length, number of fields and element size. Signed-off-by: Paolo Savini --- target/riscv/insn_trans/trans_rvv.c.inc | 164 +--- 1 file changed, 119

[RFC 0/1 v2] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-20 Thread Paolo Savini
w and explanations. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: use tcg ops generati

[RFC 1/1 v2] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-20 Thread Paolo Savini
register load or store. Signed-off-by: Paolo Savini --- target/riscv/insn_trans/trans_rvv.c.inc | 125 +++- 1 file changed, 78 insertions(+), 47 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index b9883a5d32..c2c2c53254

[RFC 0/1] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-20 Thread Paolo Savini
almer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: use tcg ops generation to emulate whole reg rvv loads/s

[PATCH 1/1] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-20 Thread Paolo Savini
register load or store. Signed-off-by: Paolo Savini --- target/riscv/insn_trans/trans_rvv.c.inc | 125 +++- 1 file changed, 78 insertions(+), 47 deletions(-) diff --git a/target/riscv/insn_trans/trans_rvv.c.inc b/target/riscv/insn_trans/trans_rvv.c.inc index b9883a5d32..c2c2c53254

[RFC 1/1] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-18 Thread Paolo Savini
This patch aims at emulating the whole register loads and stores through direct generation of tcg operations rather than through the aid of a helper function. Signed-off-by: Paolo Savini --- target/riscv/insn_trans/trans_rvv.c.inc | 104 +--- 1 file changed, 56 insertions

[RFC 0/1] target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.

2024-12-18 Thread Paolo Savini
athan Egge Cc: Max Chou Cc: Jeremy Bennett Cc: Craig Blackmore Paolo Savini (1): target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores. target/riscv/insn_trans/trans_rvv.c.inc | 104 +--- 1 file changed, 56 insertions(+), 48 deletions(-) -- 2.34.1

Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-11-11 Thread Paolo Savini
/qemu/blob/134b443512825bed401b6e141447b8cdc22d2efe/target/riscv/vector_helper.c#L224 Thanks Paolo On 11/8/24 09:11, Richard Henderson wrote: On 11/7/24 12:58, Daniel Henrique Barboza wrote: On 11/4/24 9:48 AM, Richard Henderson wrote: On 10/30/24 15:25, Paolo Savini wrote: On 10/30/24 11:4

[RFC v5 0/1] target/riscv: rvv: reduce the overhead for simple RISC-V vector.

2024-11-11 Thread Paolo Savini
The version 5 of the patch set splits the patches into independent submissions so to simplify the review process. Previous versions: - v1: https://lore.kernel.org/all/20240717153040.11073-1-paolo.sav...@embecosm.com/ - v2: https://lore.kernel.org/all/20241002135708.99146-1-paolo.sav...@embecosm.

[RFC v5 1/1] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-11-11 Thread Paolo Savini
sters (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini Signed-off-by: Helene C

Re: [RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-30 Thread Paolo Savini
Thanks for the review Richard. On 10/30/24 11:40, Richard Henderson wrote: On 10/29/24 19:43, Paolo Savini wrote: This patch optimizes the emulation of unit-stride load/store RVV instructions when the data being loaded/stored per iteration amounts to 16 bytes or more. The optimization

[RFC v4 0/2] target/riscv: add wrapper for target specific macros in atomicity check.

2024-10-29 Thread Paolo Savini
Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: rvv: improve performance of RISC-V vector loads and stores on

[RFC v4 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-29 Thread Paolo Savini
and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine, if the host is little endiand and if it supports atomic 128 bit memory operations. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c| 17

[RFC v4 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-29 Thread Paolo Savini
sters (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini Signed-off-by: Helene C

[RFC v3 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-14 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC v3 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-10-14 Thread Paolo Savini
register and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine, if the host is little endiand and if it supports atomic 128 bit memory operations. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 14 +- 1

[RFC v3 0/2] target/riscv: add endianness checks and atomicity guarantees.

2024-10-14 Thread Paolo Savini
Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini

[RFC v2 2/2] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-10-02 Thread Paolo Savini
The simplified emulation of vector loads and stores that bypasses the memory probing in the vext_ldst_us helper function seem to benefit only the user mode. We therefore limit this approach to the user mode configuration. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 3 ++- 1

[RFC v2 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-10-02 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC v2 0/2] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-10-02 Thread Paolo Savini
Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Helene CHELIN (1): target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: use a simplified loop to emulate

[RFC 1/1] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-09-25 Thread Paolo Savini
The simplified emulation of vector loads and stores that bypasses the memory probing in the vext_ldst_us helper function seem to benefit only the user mode. We therefore limit this approach to the user mode configuration. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 3 ++- 1

[RFC 0/1] target/riscv: use a simplified loop to emulate rvv loads/stores only in user mode.

2024-09-25 Thread Paolo Savini
load/store loop for small vector and data sizes when QEMU is in system mode. Cc: Richard Handerson Cc: Palmer Dabbelt Cc: Alistair Francis Cc: Bin Meng Cc: Weiwei Li Cc: Daniel Henrique Barboza Cc: Liu Zhiwei Cc: Helene Chelin Cc: Nathan Egge Cc: Max Chou Paolo Savini (1): target/riscv

Re: [RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-09-10 Thread Paolo Savini
Thanks for the feedback Richard, I'm working on the endianness. Could you please give me more details about the atomicity issues you are referring to? Best wishes Paolo On 7/27/24 08:15, Richard Henderson wrote: On 7/18/24 01:30, Paolo Savini wrote: This patch optimizes the emulati

[RFC 1/2] target/riscv: rvv: reduce the overhead for simple RISC-V vector unit-stride loads and stores

2024-07-17 Thread Paolo Savini
f the vector registers (LMUL=1). The optimization consists of avoiding the overhead of probing the RAM of the host machine and doing a loop load/store on the input data grouped in chunks of as many bytes as possible (8,4,2,1 bytes). Co-authored-by: Helene CHELIN Co-authored-by: Paolo Savini S

[RFC 0/2] Improve the performance of unit-stride RVV ld/st on

2024-07-17 Thread Paolo Savini
erhead for simple RISC-V vector unit-stride loads and stores Paolo Savini (1): target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data. target/riscv/vector_helper.c | 63 +++- 1 file changed, 62 insertions(+),

[RFC 2/2] target/riscv: rvv: improve performance of RISC-V vector loads and stores on large amounts of data.

2024-07-17 Thread Paolo Savini
register and the destination memory address and vice versa. This is done only if we have direct access to the RAM of the host machine. Signed-off-by: Paolo Savini --- target/riscv/vector_helper.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/target/riscv