From: Aleksandar Markovic <aleksandar.marko...@mips.com> This series introduces MTTCG feature for MIPS targets by adding all missing bits and pieces, and formally enabling corresponding QEMU builds to support such configurations.
PATCH ORGANIZATION ================== The organization of patches is as follows: - patches 1 and 2 deal with MIPS' LL/SC instruction emulation improvements related to MTTCG. They are based on a previously sent patch series by Leon Alrae (this is the last version, v3): http://lists.gnu.org/archive/html/qemu-devel/2016-09/msg06870.html - patches 3, 4, 5, and 6 deal with locking/synchronization issues that surfaced while introducing MTTCG for MIPS. Similar sets of patches have been already integrated for some other platforms (arm, intel, ppc, sparc). - patch 7 just enables QEMU build system to support MTTCG feature for MIPS targets. PERFORMANCE TESTING =================== Performance testing was performed using atomic_add-bench test program that tests LL/SC-related functionality in multithread environment. The observed performance gain was significant. For the sake of comparison, test case organization mimics the one from a previously sent patch set: target-arm: emulate aarch64's LL/SC using cmpxchg helpers https://lists.gnu.org/archive/html/qemu-devel/2016-10/msg06653.html ----------------------------------------------------------------------- atomic_add-bench: 1000000 ops/thread, [0,1] range throughput M - MTTCG N - no MTTCG 50 +---------+---------+---------+---------+---------+---------+----+ | | |M | 40 +. + |. | |. | 30 +. + |. | |. | 20 +. + | M | | . | 10 + .M...M.......M.......M.......M.......M.......M.......M.......M+ |N | | N.N...N.......N.......N.......N.......N.......N.......N.......N| 0 +---------+---------+---------+---------+---------+---------+----+ 0 10 20 30 40 50 60 number of threads ----------------------------------------------------------------------- atomic_add-bench: 1000000 ops/thread, [0,2] range throughput M - MTTCG N - no MTTCG 50 +---------+---------+---------+---------+---------+---------+----+ | | |M | 40 +. + |. | |. | 30 + . + | M | | . | 20 + .M...M.......M.......M.......M.......M.......M.......M.......M+ | | | | 10 + + |N | | N.N...N.......N.......N.......N.......N.......N.......N.......N| 0 +---------+---------+---------+---------+---------+---------+----+ 0 10 20 30 40 50 60 number of threads ----------------------------------------------------------------------- atomic_add-bench: 1000000 ops/thread, [0,1] range throughput M - MTTCG N - no MTTCG 150 +---------+---------+---------+---------+---------+---------+----+ | | | ...M... ....M| 120 + ....M.......M........M... ....M... + | ....M... | | ..M... | 90 + . + | .M | | . | 60 + M + |. | |M | 30 + + | | |NN.N...N.......N.......N.......N.......N.......N.......N.......N| 0 +---------+---------+---------+---------+---------+---------+----+ 0 10 20 30 40 50 60 number of threads ----------------------------------------------------------------------- atomic_add-bench: 1000000 ops/thread, [0,2] range throughput M - MTTCG N - no MTTCG 150 +---------+---------+---------+---------+---------+---------+----+ | ...M.......M.......M| | ....M... .. | 120 + ....M.......M... ....M.. + | ..M... | | M. | 90 + . + | . | | . | 60 + M + |. | |M | 30 + + | | |NN.N...N.......N.......N.......N.......N.......N.......N.......N| 0 +---------+---------+---------+---------+---------+---------+----+ 0 10 20 30 40 50 60 number of threads ----------------------------------------------------------------------- Numerical data: Ops Range--> 1 2 128 1024 # of no no no no thr. MTTCG MTTCG MTTCG MTTCG MTTCG MTTCG MTTCG MTTCG 1 4.95 42.61 4.94 42.27 4.89 42.24 4.85 41.81 2 1.23 18.41 1.29 25.71 1.33 57.41 1.36 60.34 4 0.46 11.99 0.48 19.69 0.53 78.98 0.50 95.39 8 0.18 9.59 0.18 19.11 0.19 104.66 0.20 112.66 16 0.11 11.19 0.12 19.12 0.12 108.29 0.13 121.90 24 0.10 10.18 0.09 19.14 0.11 115.53 0.10 127.40 32 0.11 11.15 0.12 19.36 0.09 120.60 0.10 131.60 40 0.08 10.47 0.11 20.88 0.12 124.59 0.10 124.74 48 0.12 11.78 0.13 20.09 0.11 129.24 0.11 137.19 56 0.14 12.40 0.13 22.13 0.15 124.16 0.15 138.52 64 0.14 11.08 0.20 21.08 0.18 131.28 0.19 144.84 ----------------------------------------------------------------------- Graphical representation: https://i.imgur.com/OtNLpVX.png ----------------------------------------------------------------------- REGRESSION TESTING ================== Regression testing was also performed. The main test bed for regression testing was LTP test suite executed on QEMU-emulated Debian mips64 system. Some LTP tests (getrusage04, copy_file_range01) that used to fail for non-MTTCG systems, pass for MTTCG-enabled systems. Also, some LTP tests (nanosleep01, poll02, pselect01) intermittently fail on both non-MTTCG and MTTCG configurations, and therefore do not represent valid regressions. Emulation by itself did not appear to have any problems while executing LTP test suite. QEMU user mode MTTCG-enabled emulation was also tested to some extent. Aleksandar Markovic (2): Revert "target/mips: hold BQL for timer interrupts" target/mips: introduce MTTCG-enabled builds Goran Ferenc (1): target/mips: hold BQL in mips_vpe_wake() Leon Alrae (2): target/mips: compare virtual addresses in LL/SC sequence target/mips: reimplement SC instruction and use cmpxchg Miodrag Dinic (2): hw/mips_int: hold BQL for all interrupt requests hw/mips_cpc: kick a VP when putting it into Run state configure | 3 ++ hw/mips/mips_int.c | 12 +++++ hw/misc/mips_cpc.c | 17 ++++++- linux-user/main.c | 58 ------------------------ target/mips/cpu.h | 9 ++-- target/mips/helper.c | 6 +-- target/mips/helper.h | 2 - target/mips/machine.c | 7 +-- target/mips/op_helper.c | 74 +++++++++--------------------- target/mips/translate.c | 118 ++++++++++++++++-------------------------------- 10 files changed, 100 insertions(+), 206 deletions(-) -- 2.7.4