From: Glenn Miles <mil...@linux.ibm.com>

The current xive algorithm for finding a matching group vCPU
target always uses the first vCPU found.  And, since it always
starts the search with thread 0 of a core, thread 0 is almost
always used to handle group interrupts.  This can lead to additional
interrupt latency and poor performance for interrupt intensive
work loads.

Changing this to use a simple round-robin algorithm for deciding which
thread number to use when starting a search, which leads to a more
distributed use of threads for handling group interrupts.

[npiggin: Also round-robin among threads, not just cores]

Signed-off-by: Glenn Miles <mil...@linux.ibm.com>
Reviewed-by: Nicholas Piggin <npig...@gmail.com>
Reviewed-by: Glenn Miles <mil...@linux.ibm.com>
Reviewed-by: Michael Kowal <ko...@linux.ibm.com>
Reviewed-by: Caleb Schlossin <cal...@linux.ibm.com>
Tested-by: Gautam Menghani <gau...@linux.ibm.com>
Link: 
https://lore.kernel.org/qemu-devel/20250512031100.439842-9-npig...@gmail.com
Signed-off-by: Cédric Le Goater <c...@redhat.com>
---
 hw/intc/pnv_xive2.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/intc/pnv_xive2.c b/hw/intc/pnv_xive2.c
index ec247ce48ff7..25dc8a372d2f 100644
--- a/hw/intc/pnv_xive2.c
+++ b/hw/intc/pnv_xive2.c
@@ -643,13 +643,18 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr, 
uint8_t format,
     int i, j;
     bool gen1_tima_os =
         xive->cq_regs[CQ_XIVE_CFG >> 3] & CQ_XIVE_CFG_GEN1_TIMA_OS;
+    static int next_start_core;
+    static int next_start_thread;
+    int start_core = next_start_core;
+    int start_thread = next_start_thread;
 
     for (i = 0; i < chip->nr_cores; i++) {
-        PnvCore *pc = chip->cores[i];
+        PnvCore *pc = chip->cores[(i + start_core) % chip->nr_cores];
         CPUCore *cc = CPU_CORE(pc);
 
         for (j = 0; j < cc->nr_threads; j++) {
-            PowerPCCPU *cpu = pc->threads[j];
+            /* Start search for match with different thread each call */
+            PowerPCCPU *cpu = pc->threads[(j + start_thread) % cc->nr_threads];
             XiveTCTX *tctx;
             int ring;
 
@@ -694,6 +699,15 @@ static int pnv_xive2_match_nvt(XivePresenter *xptr, 
uint8_t format,
                     if (!match->tctx) {
                         match->ring = ring;
                         match->tctx = tctx;
+
+                        next_start_thread = j + start_thread + 1;
+                        if (next_start_thread >= cc->nr_threads) {
+                            next_start_thread = 0;
+                            next_start_core = i + start_core + 1;
+                            if (next_start_core >= chip->nr_cores) {
+                                next_start_core = 0;
+                            }
+                        }
                     }
                     count++;
                 }
-- 
2.50.1


Reply via email to