Re: Re: [PATCH] backends/cryptodev-vhost-user: Fix local_error leaks

2024-12-24 Thread zhenwei pi
On 12/24/24 16:59, Philippe Mathieu-Daudé wrote: Hi Gabriel, On 24/12/24 00:46, Gabriel Barrantes wrote:  From c808fa797942b9bd32221594b7eef690a7558b14 Mon Sep 17 00:00:00 2001 From: Gabriel Barrantes Date: Mon, 23 Dec 2024 14:58:12 -0600 Subject: [PATCH] backends/cryptodev-vhost-user: Fix

Re: [PATCH] backends/cryptodev-vhost-user: Fix local_error leaks

2024-12-24 Thread zhenwei pi
LGTM, thanks. Reviewed-by: zhenwei pi On 12/24/24 07:46, Gabriel Barrantes wrote: From c808fa797942b9bd32221594b7eef690a7558b14 Mon Sep 17 00:00:00 2001 From: Gabriel Barrantes Date: Mon, 23 Dec 2024 14:58:12 -0600 Subject: [PATCH] backends/cryptodev-vhost-user: Fix local_error leaks Do not

[PATCH 2/8] futex: Support Windows

2024-12-24 Thread Akihiko Odaki
Windows supports futex-like APIs since Windows 8 and Windows Server 2012. Signed-off-by: Akihiko Odaki --- meson.build | 2 ++ include/qemu/futex.h | 52 ++- tests/unit/test-aio-multithread.c | 2 +- util/lockcnt.c

[PATCH 7/8] migration/colo: Replace QemuSemaphore with QemuEvent

2024-12-24 Thread Akihiko Odaki
colo_exit_sem and colo_incoming_sem represent one-shot events so they can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki --- migration/migration.h | 6 +++--- migration/colo.c | 20 ++-- 2 files changed, 13 insertions(+), 13 deletions(-

[PATCH 5/8] qemu-thread: Use futex if available for QemuLockCnt

2024-12-24 Thread Akihiko Odaki
This unlocks the futex-based implementation of QemuLockCnt to Windows. Signed-off-by: Akihiko Odaki --- include/qemu/lockcnt.h | 2 +- util/lockcnt.c | 7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/qemu/lockcnt.h b/include/qemu/lockcnt.h index f4b62a3f

[PATCH 4/8] qemu-thread: Use futex for QemuEvent on Windows

2024-12-24 Thread Akihiko Odaki
Use the futex-based implementation of QemuEvent on Windows to remove code duplication and remove the overhead of event object construction and destruction. Signed-off-by: Akihiko Odaki --- include/qemu/thread-posix.h | 9 --- include/qemu/thread-win32.h | 6 -- include/qemu/thread.h |

[PATCH 8/8] migration/postcopy: Replace QemuSemaphore with QemuEvent

2024-12-24 Thread Akihiko Odaki
thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki --- migration/migration.h| 4 ++-- migration/postcopy-ram.c | 10 +- migration/savevm.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-)

[PATCH 6/8] migration: Replace QemuSemaphore with QemuEvent

2024-12-24 Thread Akihiko Odaki
rp_pong_acks tells if it has ever received one pong. QemuEvent is better suited for this usage because it represents a boolean rather than integer and will not decrement with the wait operation. pause_event can utilize qemu_event_reset() to discard events. Signed-off-by: Akihiko Odaki --- migra

[PATCH 1/8] futex: Check value after qemu_futex_wait()

2024-12-24 Thread Akihiko Odaki
futex(2) - Linux manual page https://man7.org/linux/man-pages/man2/futex.2.html > Note that a wake-up can also be caused by common futex usage patterns > in unrelated code that happened to have previously used the futex > word's memory location (e.g., typical futex-based implementations of > Pthrea

[PATCH 0/8] Improve futex usage

2024-12-24 Thread Akihiko Odaki
In a recent discussion, Phil Dennis-Jordan pointed out a quirk in QemuEvent destruction due to futex-like abstraction, which prevented the usage of QemuEvent in new and existing code[1]. With some more thoughts after this discussion, I also found other problem and room of improvement in futex usage

[PATCH 3/8] qemu-thread: Avoid futex abstraction for non-Linux

2024-12-24 Thread Akihiko Odaki
qemu-thread used to abstract pthread primitives into futex for the QemuEvent implementation of POSIX systems other than Linux. However, this abstraction has one key difference: unlike futex, pthread primitives require an explicit destruction, and it must be ordered after wait and wake operations.

Re: [PATCH v5 04/11] target/i386/kvm: Only save/load kvmclock MSRs when kvmclock enabled

2024-12-24 Thread Zhao Liu
On Tue, Dec 24, 2024 at 04:31:28PM +0100, Paolo Bonzini wrote: > Date: Tue, 24 Dec 2024 16:31:28 +0100 > From: Paolo Bonzini > Subject: Re: [PATCH v5 04/11] target/i386/kvm: Only save/load kvmclock MSRs > when kvmclock enabled > > On 11/6/24 04:07, Zhao Liu wrote: > > MSR_KVM_SYSTEM_TIME and MSR

Re: [PATCH v5 05/11] target/i386/kvm: Save/load MSRs of kvmclock2 (KVM_FEATURE_CLOCKSOURCE2)

2024-12-24 Thread Zhao Liu
On Tue, Dec 24, 2024 at 04:32:42PM +0100, Paolo Bonzini wrote: > Date: Tue, 24 Dec 2024 16:32:42 +0100 > From: Paolo Bonzini > Subject: Re: [PATCH v5 05/11] target/i386/kvm: Save/load MSRs of kvmclock2 > (KVM_FEATURE_CLOCKSOURCE2) > > On 11/6/24 04:07, Zhao Liu wrote: > > MSR_KVM_SYSTEM_TIME_NEW

Re: [PATCH v5 11/11] target/i386/kvm: Replace ARRAY_SIZE(msr_handlers) with KVM_MSR_FILTER_MAX_RANGES

2024-12-24 Thread Zhao Liu
On Tue, Dec 24, 2024 at 04:54:41PM +0100, Paolo Bonzini wrote: > Date: Tue, 24 Dec 2024 16:54:41 +0100 > From: Paolo Bonzini > Subject: Re: [PATCH v5 11/11] target/i386/kvm: Replace > ARRAY_SIZE(msr_handlers) with KVM_MSR_FILTER_MAX_RANGES > > On 11/6/24 04:07, Zhao Liu wrote: > > kvm_install_ms

Re: [PATCH v5 10/11] target/i386/kvm: Clean up error handling in kvm_arch_init()

2024-12-24 Thread Zhao Liu
On Tue, Dec 24, 2024 at 04:53:36PM +0100, Paolo Bonzini wrote: > Date: Tue, 24 Dec 2024 16:53:36 +0100 > From: Paolo Bonzini > Subject: Re: [PATCH v5 10/11] target/i386/kvm: Clean up error handling in > kvm_arch_init() > > On 11/6/24 04:07, Zhao Liu wrote: > > Currently, there're following incor

Re: [PATCH v6 0/4] i386: Support SMP Cache Topology

2024-12-24 Thread Zhao Liu
> > About smp-cache > > === > > > > The API design has been discussed heavily in [3]. > > > > Now, smp-cache is implemented as a array integrated in -machine. Though > > -machine currently can't support JSON format, this is the one of the > > directions of future. > > > > An example

[PULL 6/6] target/loongarch: Use auto method with LASX feature

2024-12-24 Thread Bibo Mao
Like LSX feature, add type OnOffAuto for LASX feature setting. Signed-off-by: Bibo Mao Reviewed-by: Bibo Mao --- target/loongarch/cpu.c | 50 +++ target/loongarch/cpu.h | 2 ++ target/loongarch/kvm/kvm.c | 53 ++ 3 fil

[PULL 1/6] target/loongarch: Fix vldi inst

2024-12-24 Thread Bibo Mao
From: ghy <2247883...@qq.com> Refer to the link below for a description of the vldi instructions: https://jia.je/unofficial-loongarch-intrinsics-guide/lsx/misc/#synopsis_88 Fixed errors in vldi instruction implementation. Signed-off-by: Guo Hongyu Tested-by: Xianglai Li Signed-off-by: Xianglai

[PULL 4/6] hw/loongarch/virt: Improve fdt table creation for CPU object

2024-12-24 Thread Bibo Mao
For CPU object, possible_cpu_arch_ids() function is used rather than smp.cpus. With command -smp x, -device la464-loongarch-cpu, smp.cpus is not accurate for all possible CPU objects, possible_cpu_arch_ids() is used here. Signed-off-by: Bibo Mao Reviewed-by: Bibo Mao --- hw/loongarch/virt.c | 3

[PULL 2/6] target/loongarch: Use actual operand size with vbsrl check

2024-12-24 Thread Bibo Mao
Hardcoded 32 bytes is used for vbsrl emulation check, there is problem when options lsx=on,lasx=off is used for vbsrl.v instruction in TCG mode. It injects LASX exception rather LSX exception. Here actual operand size is used. Cc: qemu-sta...@nongnu.org Fixes: df97f338076 ("target/loongarch: Impl

[PULL 0/6] loongarch-to-apply queue

2024-12-24 Thread Bibo Mao
The following changes since commit aa3a285b5bc56a4208b3b57d4a55291e9c260107: Merge tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into staging (2024-12-22 14:33:27 -0500) are available in the Git repository at: https://gitlab.com/bibo-mao/qemu.git tags/pull-loongarch-20241

[PULL 5/6] target/loongarch: Use auto method with LSX feature

2024-12-24 Thread Bibo Mao
Like LBT feature, add type OnOffAuto for LSX feature setting. Also add LSX feature detection with new VM ioctl command, fallback to old method if it is not supported. Signed-off-by: Bibo Mao Reviewed-by: Bibo Mao --- target/loongarch/cpu.c | 38 +++ target/loongarch/

[PULL 3/6] hw/loongarch/virt: Create fdt table on machine creation done notification

2024-12-24 Thread Bibo Mao
The same with ACPI table, fdt table is created on machine done notification. Some objects like CPU objects can be created with cold-plug method with command such as -smp x, -device la464-loongarch-cpu, so all objects finish to create when machine is done. Signed-off-by: Bibo Mao Reviewed-by: Bibo

[PATCH 3/5] aspeed: Introduce AST27x0 SoC with Cortex-M4 support

2024-12-24 Thread Steven Lee via
This initial module adds support for the AST27x0 SoC, which features four Cortex-A35 cores and two Cortex-M4 cores. The patch enables emulation of the Cortex-M4 cores, laying the groundwork for co-processor support. Signed-off-by: Steven Lee --- hw/arm/aspeed_ast27x0-cm4.c | 397

[PATCH 4/5] aspeed: Introduce ast2700-fc machine

2024-12-24 Thread Steven Lee via
This patch introduces a new machine, ast2700-fc, which supports all cores available in the AST27x0 SoC. In this machine - The first 4 cores are Cortex-A35 cores. - CPU 4 is designated as the SSP core. - CPU 5 is designated as the TSP core. Test Step: wget https://github.com/stevenlee7189/zeph

[PATCH 5/5] docs: aspeed: Add ast2700-fc machine section

2024-12-24 Thread Steven Lee via
This commit adds a section describing the ast2700-fc multi-SoC machine. Signed-off-by: Steven Lee --- docs/system/arm/aspeed.rst | 50 -- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/docs/system/arm/aspeed.rst b/docs/system/arm/aspeed.rst ind

[PATCH 0/5] Introduce AST27x0 multi-SoC machine

2024-12-24 Thread Steven Lee via
This patch series introduces full cores support for the AST27x0 SoC, along with necessary updates to the ASPEED AST27x0 SOC. The AST27x0 SoC is a new family of ASPEED SoCs featuring 4 Cortex-A35 cores and 2 Cortex-M4 cores. This patch set adds the following updates: 1. Public API updates: Modi

[PATCH 2/5] aspeed: ast27x0: Map unimplemented devices in SoC memory

2024-12-24 Thread Steven Lee via
Maps following unimplemented devices in SoC memory - dpmcu - iomem0 - iomem1 - ltpi - io Signed-off-by: Steven Lee --- hw/arm/aspeed_ast27x0.c | 45 +++-- include/hw/arm/aspeed_soc.h | 6 + 2 files changed, 44 insertions(+), 7 deletions(-) diff --git a/h

[PATCH 1/5] aspeed: Make sdhci_attach_drive and write_boot_rom public

2024-12-24 Thread Steven Lee via
sdhci_attach_drive and write_boot_rom functions may be used by the aspeed machine supporting co-processors. Signed-off-by: Steven Lee --- hw/arm/aspeed.c | 4 ++-- include/hw/arm/aspeed.h | 6 ++ 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/hw/arm/aspeed.c b/hw/arm/

[PATCH v9 11/12] migration/multifd: Add integration tests for multifd with Intel DSA offloading.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * Add test case to start and complete multifd live migration with DSA offloading enabled. * Add test case to start and cancel multifd live migration with DSA offloading enabled. Signed-off-by: Bryan Zhang Signed-off-by: Hao Xiang Signed-off-by: Yichen Wang Reviewed-by: Fabiano

[PATCH v9 02/12] util/dsa: Add idxd into linux header copy list.

2024-12-24 Thread Yichen Wang
Signed-off-by: Yichen Wang Reviewed-by: Fabiano Rosas --- scripts/update-linux-headers.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh index 99a8d9fa4c..9128c7499b 100755 --- a/scripts/update-linux-headers.s

[PATCH v9 06/12] util/dsa: Implement zero page checking in DSA task.

2024-12-24 Thread Yichen Wang
From: Hao Xiang Create DSA task with operation code DSA_OPCODE_COMPVAL. Here we create two types of DSA tasks, a single DSA task and a batch DSA task. Batch DSA task reduces task submission overhead and hence should be the default option. However, due to the way DSA hardware works, a DSA batch ta

[PATCH v9 03/12] util/dsa: Implement DSA device start and stop logic.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * DSA device open and close. * DSA group contains multiple DSA devices. * DSA group configure/start/stop/clean. Signed-off-by: Hao Xiang Signed-off-by: Bryan Zhang Signed-off-by: Yichen Wang Reviewed-by: Fabiano Rosas --- include/qemu/dsa.h | 99 util/dsa.c

[PATCH v9 10/12] util/dsa: Add unit test coverage for Intel DSA task submission and completion.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * Test DSA start and stop path. * Test DSA configure and cleanup path. * Test DSA task submission and completion path. Signed-off-by: Bryan Zhang Signed-off-by: Hao Xiang Signed-off-by: Yichen Wang --- tests/unit/meson.build | 6 + tests/unit/test-dsa.c | 504 +

[PATCH v9 09/12] migration/multifd: Enable DSA offloading in multifd sender path.

2024-12-24 Thread Yichen Wang
From: Hao Xiang Multifd sender path gets an array of pages queued by the migration thread. It performs zero page checking on every page in the array. The pages are classfied as either a zero page or a normal page. This change uses Intel DSA to offload the zero page checking from CPU to the DSA ac

[PATCH v9 05/12] util/dsa: Implement DSA task asynchronous completion thread model.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * Create a dedicated thread for DSA task completion. * DSA completion thread runs a loop and poll for completed tasks. * Start and stop DSA completion thread during DSA device start stop. User space application can directly submit task to Intel DSA accelerator by writing to DSA's

[PATCH v9 00/12] Use Intel DSA accelerator to offload zero page checking in multifd live migration.

2024-12-24 Thread Yichen Wang
v9 * Rebase on top of aa3a285b5bc56a4208b3b57d4a55291e9c260107; * Optimize the error handling in multifd_send_setup(); * Use the correct way for skipping unit test; v8 * Rebase on top of 1cf9bc6eba7506ab6d9de635f224259225f63466; * Fixed the hmp parsing crash in migrate_set_parameter; * Addressed i

[PATCH v9 04/12] util/dsa: Implement DSA task enqueue and dequeue.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * Use a safe thread queue for DSA task enqueue/dequeue. * Implement DSA task submission. * Implement DSA batch task submission. Signed-off-by: Hao Xiang Signed-off-by: Yichen Wang Reviewed-by: Fabiano Rosas --- include/qemu/dsa.h | 29 +++ util/dsa.c | 186 ++

[PATCH v9 07/12] util/dsa: Implement DSA task asynchronous submission and wait for completion.

2024-12-24 Thread Yichen Wang
From: Hao Xiang * Add a DSA task completion callback. * DSA completion thread will call the tasks's completion callback on every task/batch task completion. * DSA submission path to wait for completion. * Implement CPU fallback if DSA is not able to complete the task. Signed-off-by: Hao Xiang S

[PATCH v9 12/12] migration/doc: Add DSA zero page detection doc

2024-12-24 Thread Yichen Wang
From: Yuan Liu Signed-off-by: Yuan Liu Signed-off-by: Yichen Wang Reviewed-by: Fabiano Rosas --- .../migration/dsa-zero-page-detection.rst | 290 ++ docs/devel/migration/features.rst | 1 + 2 files changed, 291 insertions(+) create mode 100644 docs/devel/mig

[PATCH v9 08/12] migration/multifd: Add new migration option for multifd DSA offloading.

2024-12-24 Thread Yichen Wang
From: Hao Xiang Intel DSA offloading is an optional feature that turns on if proper hardware and software stack is available. To turn on DSA offloading in multifd live migration by setting: zero-page-detection=dsa-accel accel-path="dsa: dsa:[dsa_dev_path2] ..." This feature is turned off by def

[PATCH v9 01/12] meson: Introduce new instruction set enqcmd to the build system.

2024-12-24 Thread Yichen Wang
From: Hao Xiang Enable instruction set enqcmd in build. Signed-off-by: Hao Xiang Signed-off-by: Yichen Wang Reviewed-by: Fabiano Rosas --- meson.build | 14 ++ meson_options.txt | 2 ++ scripts/meson-buildoptions.sh | 3 +++ 3 files changed, 19 ins

[PULL 43/72] tcg/optimize: Use fold_masks_zs, fold_masks_s in fold_shift

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 27 ++- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 2d634c8925..b7

[PULL 05/72] tcg/optimize: Copy mask writeback to fold_masks

2024-12-24 Thread Richard Henderson
Use of fold_masks should be restricted to those opcodes that can reliably make use of it -- those with a single output, and from higher-level folders that set up the masks. Prepare for conversion of each folder in turn. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/opti

[PULL 52/72] tcg/optimize: Re-enable sign-mask optimizations

2024-12-24 Thread Richard Henderson
All instances of s_mask have been converted to the new representation. We can now re-enable usage. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 98b

[PULL 61/72] target/hexagon: Use float32_mul in helper_sfmpy

2024-12-24 Thread Richard Henderson
There are no special cases for this instruction. Remove internal_mpyf as unused. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.h | 1 - target/hexagon/fma_emu.c | 8 target/hexagon/op_helper.c | 2 +- 3 files changed, 1 insertion(+), 10 deletio

[PULL 27/72] tcg/optimize: Use fold_masks_s in fold_nand

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 10d1376f62..7fe5bd6012 100644 --- a/tcg/optimize.c +++ b/tcg

[PULL 71/72] target/hexagon: Simplify internal_mpyhh setup

2024-12-24 Thread Richard Henderson
Initialize x with accumulated via direct assignment, rather than multiplying by 1. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/hexagon/fma_emu.c b/target/hexagon/fma_emu.c index

[PULL 65/72] target/hexagon: Use float32_muladd for helper_sffm[as]_lib

2024-12-24 Thread Richard Henderson
There are multiple special cases for this instruction. (1) The saturate to normal maximum instead of overflow to infinity is handled by the new float_round_nearest_even_max rounding mode. (2) The 0 * n + c special case is handled by the new float_muladd_suppress_add_product_zero flag. (3) T

[PULL 72/72] accel/tcg: Move gen_intermediate_code to TCGCPUOps.translate_core

2024-12-24 Thread Richard Henderson
Convert all targets simultaneously, as the gen_intermediate_code function disappears from the target. While there are possible workarounds, they're larger than simply performing the conversion. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- include/exec/translator.h

[PULL 04/72] tcg/optimize: Split out fold_affected_mask

2024-12-24 Thread Richard Henderson
There are only a few logical operations which can compute an "affected" mask. Split out handling of this optimization to a separate function, only to be called when applicable. Remove the a_mask field from OptContext, as the mask is no longer stored anywhere. Reviewed-by: Pierrick Bouvier Signe

[PULL 07/72] tcg/optimize: Augment s_mask from z_mask in fold_masks_zs

2024-12-24 Thread Richard Henderson
Consider the passed s_mask to be a minimum deduced from either existing s_mask or from a sign-extension operation. We may be able to deduce more from the set of known zeros. Remove identical logic from several opcode folders. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tc

[PULL 69/72] target/hexagon: Remove Double

2024-12-24 Thread Richard Henderson
This structure, with bitfields, is incorrect for big-endian. Use extract64 and deposit64 instead. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.c | 46 ++-- 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/

[PULL 45/72] tcg/optimize: Use finish_folding in fold_sub, fold_sub_vec

2024-12-24 Thread Richard Henderson
Duplicate fold_sub_vec into fold_sub instead of calling it, now that fold_sub_vec always returns true. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c i

[PULL 20/72] tcg/optimize: Use fold_masks_s in fold_eqv

2024-12-24 Thread Richard Henderson
Add fold_masks_s as a trivial wrapper around fold_masks_zs. Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c

[PULL 63/72] target/hexagon: Use float32_muladd for helper_sffms

2024-12-24 Thread Richard Henderson
There are no special cases for this instruction. Since hexagon always uses default-nan mode, explicitly negating the first input is unnecessary. Use float_muladd_negate_product instead. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/op_helper.c | 5 ++--- 1 file c

[PULL 67/72] target/hexagon: Expand GEN_XF_ROUND

2024-12-24 Thread Richard Henderson
This massive macro is now only used once. Expand it for use only by float64. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.c | 255 +++ 1 file changed, 127 insertions(+), 128 deletions(-) diff --git a/target/hexagon/fma_

[PULL 70/72] target/hexagon: Use mulu64 for int128_mul_6464

2024-12-24 Thread Richard Henderson
No need to open-code 64x64->128-bit multiplication. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.c | 32 +++- 1 file changed, 3 insertions(+), 29 deletions(-) diff --git a/target/hexagon/fma_emu.c b/target/hexagon/fma_emu.c ind

[PULL 10/72] tcg/optimize: Introduce const value accessors for TempOptInfo

2024-12-24 Thread Richard Henderson
Introduce ti_is_const, ti_const_val, ti_is_const_val. Signed-off-by: Richard Henderson --- tcg/optimize.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 26d1c5d4a1..5090f6e759 100644 --- a/tcg/optimize.c +++ b/tcg/o

[PULL 49/72] tcg/optimize: Use finish_folding in fold_bitsel_vec

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index d543266b8d..4271d14d2c 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -2833,7 +2833,7 @@ static bool fo

[PULL 22/72] tcg/optimize: Use finish_folding in fold_extract2

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 8111c120af..04ec6fdcef 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -1773,7 +1773,7 @@ static bool fo

[PULL 55/72] softfloat: Add float{16,32,64}_muladd_scalbn

2024-12-24 Thread Richard Henderson
We currently have a flag, float_muladd_halve_result, to scale the result by 2**-1. Extend this to handle arbitrary scaling. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- include/fpu/softfloat.h | 6 fpu/softfloat.c | 58 ++---

[PULL 54/72] tcg/optimize: Move fold_cmp_vec, fold_cmpsel_vec into alphabetic sort

2024-12-24 Thread Richard Henderson
The big comment just above says functions should be sorted. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 60 +- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c

[PULL 64/72] target/hexagon: Use float32_muladd_scalbn for helper_sffma_sc

2024-12-24 Thread Richard Henderson
This instruction has a special case that 0 * x + c returns c without the normal sign folding that comes with 0 + -0. Use the new float_muladd_suppress_add_product_zero to describe this. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/op_helper.c | 11 +++ 1 f

[PULL 31/72] tcg/optimize: Use fold_masks_zs in fold_or

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 4ede218bfc..e284d79fb1 100644

[PULL 50/72] tcg/optimize: Use finish_folding as default in tcg_optimize

2024-12-24 Thread Richard Henderson
All non-default cases now finish folding within each function. Do the same with the default case and assert it is done after. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tcg/optimize.

[PULL 28/72] tcg/optimize: Use fold_masks_z in fold_neg_no_const

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 9 ++--- 1 file changed, 2 insertions(+), 7 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 7fe5bd6012..fbaaece152 100644 --- a/tcg/optimize.c +++ b/tc

[PULL 47/72] tcg/optimize: Use finish_folding in fold_tcg_ld_memcopy

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 7141b18496..047cb5a1ee 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -2685,7 +2685,7 @@ static bool fo

[PULL 66/72] target/hexagon: Remove internal_fmafx

2024-12-24 Thread Richard Henderson
The function is now unused. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.h | 2 - target/hexagon/fma_emu.c | 171 --- 2 files changed, 173 deletions(-) diff --git a/target/hexagon/fma_emu.h b/target/hexagon/fma_emu.h

[PULL 14/72] tcg/optimize: Use fold_masks_zs in fold_count_zeros

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Compute s_mask from the union of the maximum count and the op2 fallback for op1 being zero. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 15 ++- 1 file changed, 10 insertions(+), 5 d

[PULL 44/72] tcg/optimize: Simplify sign bit test in fold_shift

2024-12-24 Thread Richard Henderson
Merge the two conditions, sign != 0 && !(z_mask & sign), by testing ~z_mask & sign. If sign == 0, the logical and will produce false. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/tcg/

[PULL 68/72] target/hexagon: Remove Float

2024-12-24 Thread Richard Henderson
This structure, with bitfields, is incorrect for big-endian. Use the existing float32_getexp_raw which uses extract32. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/fma_emu.c | 16 +++- 1 file changed, 3 insertions(+), 13 deletions(-) diff --git a/targ

[PULL 58/72] softfloat: Remove float_muladd_halve_result

2024-12-24 Thread Richard Henderson
All uses have been convered to float*_muladd_scalbn. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- include/fpu/softfloat.h | 3 --- fpu/softfloat.c | 6 -- fpu/softfloat-parts.c.inc | 4 3 files changed, 13 deletions(-) diff --git a/include/fpu/s

[PULL 42/72] tcg/optimize: Use fold_masks_zs in fold_sextract

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 24 +--- 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 4090ffe12c..2d634c

[PULL 48/72] tcg/optimize: Use fold_masks_zs in fold_xor

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Remove fold_masks as the function becomes unused. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/tcg/optimize

[PULL 57/72] target/sparc: Use float*_muladd_scalbn

2024-12-24 Thread Richard Henderson
Use the scalbn interface instead of float_muladd_halve_result. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/sparc/helper.h | 4 +- target/sparc/fop_helper.c | 8 ++-- target/sparc/translate.c | 80 +++ 3 files changed

[PULL 60/72] softfloat: Add float_muladd_suppress_add_product_zero

2024-12-24 Thread Richard Henderson
Certain Hexagon instructions suppress changes to the result when the product of fma() is a true zero. Signed-off-by: Richard Henderson --- include/fpu/softfloat.h | 5 + fpu/softfloat.c | 3 +++ fpu/softfloat-parts.c.inc | 4 +++- 3 files changed, 11 insertions(+), 1 deletion(-)

[PULL 62/72] target/hexagon: Use float32_muladd for helper_sffma

2024-12-24 Thread Richard Henderson
There are no special cases for this instruction. Reviewed-by: Brian Cain Signed-off-by: Richard Henderson --- target/hexagon/op_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c index d257097091..15b143a568 10064

[PULL 30/72] tcg/optimize: Use fold_masks_s in fold_not

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 7 +-- 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index acff3985f3..4ede218bfc 100644 --- a/tcg/optimize.c +++ b/tcg/o

[PULL 51/72] tcg/optimize: Remove z_mask, s_mask from OptContext

2024-12-24 Thread Richard Henderson
All mask setting is now done with parameters via fold_masks_*. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 13 - 1 file changed, 13 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 51cfcb15d2..98b41975af 100644 --- a/tcg/optimi

[PULL 56/72] target/arm: Use float*_muladd_scalbn

2024-12-24 Thread Richard Henderson
Use the scalbn interface instead of float_muladd_halve_result. Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: Richard Henderson --- target/arm/tcg/helper-a64.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c

[PULL 03/72] tcg/optimize: Split out finish_bb, finish_ebb

2024-12-24 Thread Richard Henderson
Call them directly from the opcode switch statement in tcg_optimize, rather than in finish_folding based on opcode flags. Adjust folding of conditional branches to match. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 47 +++-

[PULL 19/72] tcg/optimize: Use finish_folding in fold_dup, fold_dup2

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index a68221a027..803bceb4bd 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -1698,7 +1698,7 @@ static boo

[PULL 02/72] plugins: optimize cpu_index code generation

2024-12-24 Thread Richard Henderson
From: Pierrick Bouvier When running with a single vcpu, we can return a constant instead of a load when accessing cpu_index. A side effect is that all tcg operations using it are optimized, most notably scoreboard access. When running a simple loop in user-mode, the speedup is around 20%. Signed

[PULL 06/72] tcg/optimize: Split out fold_masks_zs

2024-12-24 Thread Richard Henderson
Add a routine to which masks can be passed directly, rather than storing them into OptContext. To be used in upcoming patches. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 15 --- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/

[PULL 37/72] tcg/optimize: Use fold_masks_z in fold_setcond

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 678015a94a..74be827f51 100644 --- a/tcg/optimize.c +++ b/tcg/optim

[PULL 46/72] tcg/optimize: Use fold_masks_zs in fold_tcg_ld

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index cd052a2dbf..7141b18496 100644 --- a/tcg/optimize.c

[PULL 59/72] softfloat: Add float_round_nearest_even_max

2024-12-24 Thread Richard Henderson
This rounding mode is used by Hexagon. Signed-off-by: Richard Henderson --- include/fpu/softfloat-types.h | 2 ++ fpu/softfloat-parts.c.inc | 3 +++ 2 files changed, 5 insertions(+) diff --git a/include/fpu/softfloat-types.h b/include/fpu/softfloat-types.h index 79ca44dcc3..9d37cdfaa8 10064

[PULL 53/72] tcg/optimize: Move fold_bitsel_vec into alphabetic sort

2024-12-24 Thread Richard Henderson
The big comment just above says functions should be sorted. Add forward declarations as needed. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 114 + 1 file changed, 59 insertions(+), 55 deletions(-) diff --gi

[PULL 41/72] tcg/optimize: Use finish_folding in fold_cmpsel_vec

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index ccdac7b7d7..4090ffe12c 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -2501,7 +2501,7 @@ static bool fo

[PULL 35/72] tcg/optimize: Use finish_folding in fold_remainder

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 07792c5351..e78f5a79a3 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -2152,7 +2152,7 @@ static bool fo

Re: [PATCH V5 23/23] migration: cpr-transfer documentation

2024-12-24 Thread Peter Xu
On Tue, Dec 24, 2024 at 08:17:08AM -0800, Steve Sistare wrote: > Signed-off-by: Steve Sistare (Not suggested to have empty commit log; can say something!) Reviewed-by: Peter Xu -- Peter Xu

[PULL 33/72] tcg/optimize: Use fold_masks_zs in fold_qemu_ld

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Be careful not to call fold_masks_zs when the memory operation is wide enough to require multiple outputs, so split into two functions: fold_qemu_ld_1reg and fold_qemu_ld_2reg. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimiz

[PULL 00/72] tcg patch queue

2024-12-24 Thread Richard Henderson
The following changes since commit aa3a285b5bc56a4208b3b57d4a55291e9c260107: Merge tag 'mem-2024-12-21' of https://github.com/davidhildenbrand/qemu into staging (2024-12-22 14:33:27 -0500) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tc

[PULL 40/72] tcg/optimize: Use finish_folding in fold_cmp_vec

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index c61d0eae4e..ccdac7b7d7 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -2480,7 +2480,7 @@ static bool fo

[PULL 26/72] tcg/optimize: Use finish_folding in fold_mul*

2024-12-24 Thread Richard Henderson
Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 0104582b3a..10d1376f62 100644 --- a/tcg/optimize.c +++ b/tcg/optimize.c @@ -1969,7 +1969,7 @@ static b

[PULL 16/72] tcg/optimize: Use fold_and and fold_masks_z in fold_deposit

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. When we fold to and, use fold_and. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 35 +-- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/tcg/optim

[PULL 11/72] tcg/optimize: Use fold_masks_zs in fold_and

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Find TempOptInfo once. Sink mask computation below fold_affected_mask early exit. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 30 -- 1 file changed, 16 insertions(+), 14 deletions(-) dif

[PULL 15/72] tcg/optimize: Use fold_masks_z in fold_ctpop

2024-12-24 Thread Richard Henderson
Add fold_masks_z as a trivial wrapper around fold_masks_zs. Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c

[PULL 24/72] tcg/optimize: Use fold_masks_z in fold_extu

2024-12-24 Thread Richard Henderson
Avoid the use of the OptContext slots. Reviewed-by: Pierrick Bouvier Signed-off-by: Richard Henderson --- tcg/optimize.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tcg/optimize.c b/tcg/optimize.c index 3aafe039ed..f62e7adfe1 100644 --- a/tcg/optimize.c +++ b/tcg/opt

  1   2   3   >