v1: https://lore.kernel.org/qemu-devel/[email protected]/ v2: https://lore.kernel.org/qemu-devel/[email protected]/ v3: https://lore.kernel.org/qemu-devel/[email protected]/
Based-on: [email protected] ("[PATCH v4 00/54] tcg: Simplify calls to load/store helpers") The main objective here is to support Arm FEAT_LSE2, which says that any single memory access that does not cross a 16-byte boundary is atomic. This is the MO_ATOM_WITHIN16 control. While I'm touching all of this, a secondary objective is to handle the atomicity of the IBM machines. Both Power and s390x treat misaligned accesses as atomic on the lsb of the pointer. For instance, an 8-byte access at ptr % 8 == 4 will appear as two atomic 4-byte accesses, and ptr % 4 == 2 will appear as four 2-byte accesses. This is the MO_ATOM_SUBALIGN control. By default, acceses are atomic only if aligned, which is the current behaviour of the tcg code generator (mostly, anyway, there were bugs). This is the MO_ATOM_IFALIGN control. Further, one can say that a large memory access is really a set of contiguous smaller accesses, and we need not provide more atomicity than that (modulo MO_ATOM_WITHIN16). This is the MO_ATMAX_* control. Changes for v4: - Rebase, fixing some conflicts. r~ Richard Henderson (57): include/exec/memop: Add bits describing atomicity accel/tcg: Add cpu_in_serial_context accel/tcg: Introduce tlb_read_idx accel/tcg: Reorg system mode load helpers accel/tcg: Reorg system mode store helpers accel/tcg: Honor atomicity of loads accel/tcg: Honor atomicity of stores target/loongarch: Do not include tcg-ldst.h tcg: Unify helper_{be,le}_{ld,st}* accel/tcg: Implement helper_{ld,st}*_mmu for user-only tcg/tci: Use helper_{ld,st}*_mmu for user-only tcg: Add 128-bit guest memory primitives meson: Detect atomic128 support with optimization tcg/i386: Add have_atomic16 accel/tcg: Use have_atomic16 in ldst_atomicity.c.inc accel/tcg: Add aarch64 specific support in ldst_atomicity tcg/aarch64: Detect have_lse, have_lse2 for linux tcg/aarch64: Detect have_lse, have_lse2 for darwin accel/tcg: Add have_lse2 support in ldst_atomicity tcg: Introduce TCG_OPF_TYPE_MASK tcg/i386: Use full load/store helpers in user-only mode tcg/aarch64: Use full load/store helpers in user-only mode tcg/ppc: Use full load/store helpers in user-only mode tcg/loongarch64: Use full load/store helpers in user-only mode tcg/riscv: Use full load/store helpers in user-only mode tcg/arm: Adjust constraints on qemu_ld/st tcg/arm: Use full load/store helpers in user-only mode tcg/mips: Use full load/store helpers in user-only mode tcg/s390x: Use full load/store helpers in user-only mode tcg/sparc64: Allocate %g2 as a third temporary tcg/sparc64: Rename tcg_out_movi_imm13 to tcg_out_movi_s13 tcg/sparc64: Rename tcg_out_movi_imm32 to tcg_out_movi_u32 tcg/sparc64: Split out tcg_out_movi_s32 tcg/sparc64: Use standard slow path for softmmu accel/tcg: Remove helper_unaligned_{ld,st} tcg/loongarch64: Assert the host supports unaligned accesses tcg/loongarch64: Support softmmu unaligned accesses tcg/riscv: Support softmmu unaligned accesses tcg: Introduce tcg_target_has_memory_bswap tcg: Add INDEX_op_qemu_{ld,st}_i128 tcg: Support TCG_TYPE_I128 in tcg_out_{ld,st}_helper_{args,ret} tcg: Introduce atom_and_align_for_opc tcg/i386: Use atom_and_align_for_opc tcg/aarch64: Use atom_and_align_for_opc tcg/arm: Use atom_and_align_for_opc tcg/loongarch64: Use atom_and_align_for_opc tcg/mips: Use atom_and_align_for_opc tcg/ppc: Use atom_and_align_for_opc tcg/riscv: Use atom_and_align_for_opc tcg/s390x: Use atom_and_align_for_opc tcg/sparc64: Use atom_and_align_for_opc tcg/i386: Honor 64-bit atomicity in 32-bit mode tcg/i386: Support 128-bit load/store with have_atomic16 tcg/aarch64: Rename temporaries tcg/aarch64: Support 128-bit load/store tcg/ppc: Support 128-bit load/store tcg/s390x: Support 128-bit load/store accel/tcg/internal.h | 5 + accel/tcg/tcg-runtime.h | 3 + include/exec/cpu-defs.h | 7 +- include/exec/cpu_ldst.h | 26 +- include/exec/memop.h | 36 + include/qemu/cpuid.h | 18 + include/tcg/tcg-ldst.h | 72 +- include/tcg/tcg-opc.h | 8 + include/tcg/tcg.h | 22 +- tcg/aarch64/tcg-target-con-set.h | 2 + tcg/aarch64/tcg-target.h | 6 +- tcg/arm/tcg-target-con-set.h | 16 +- tcg/arm/tcg-target-con-str.h | 5 +- tcg/arm/tcg-target.h | 3 +- tcg/i386/tcg-target.h | 7 +- tcg/loongarch64/tcg-target.h | 3 +- tcg/mips/tcg-target.h | 4 +- tcg/ppc/tcg-target-con-set.h | 2 + tcg/ppc/tcg-target-con-str.h | 1 + tcg/ppc/tcg-target.h | 4 +- tcg/riscv/tcg-target.h | 4 +- tcg/s390x/tcg-target-con-set.h | 2 + tcg/s390x/tcg-target.h | 4 +- tcg/sparc64/tcg-target-con-set.h | 2 - tcg/sparc64/tcg-target-con-str.h | 1 - tcg/sparc64/tcg-target.h | 4 +- tcg/tcg-internal.h | 2 + tcg/tci/tcg-target.h | 4 +- accel/tcg/cpu-exec-common.c | 3 + accel/tcg/cputlb.c | 1799 ++++++++++++++++++------------ accel/tcg/tb-maint.c | 2 +- accel/tcg/user-exec.c | 489 +++++--- target/loongarch/csr_helper.c | 1 - target/loongarch/iocsr_helper.c | 1 - tcg/optimize.c | 15 +- tcg/tcg-op.c | 265 +++-- tcg/tcg.c | 270 ++++- tcg/tci.c | 150 +-- accel/tcg/ldst_atomicity.c.inc | 1373 +++++++++++++++++++++++ docs/devel/loads-stores.rst | 36 +- docs/devel/tcg-ops.rst | 11 +- meson.build | 52 +- tcg/aarch64/tcg-target.c.inc | 384 +++++-- tcg/arm/tcg-target.c.inc | 121 +- tcg/i386/tcg-target.c.inc | 366 ++++-- tcg/loongarch64/tcg-target.c.inc | 91 +- tcg/mips/tcg-target.c.inc | 104 +- tcg/ppc/tcg-target.c.inc | 261 +++-- tcg/riscv/tcg-target.c.inc | 132 +-- tcg/s390x/tcg-target.c.inc | 177 ++- tcg/sparc64/tcg-target.c.inc | 714 ++++-------- tcg/tci/tcg-target.c.inc | 8 +- 52 files changed, 4663 insertions(+), 2435 deletions(-) create mode 100644 accel/tcg/ldst_atomicity.c.inc -- 2.34.1
