Re: [PR] Sort servers in WorkerManager to ensure deterministic workerId <-> server mapping across stages in an MSE query [pinot]

via GitHub Wed, 10 Dec 2025 18:04:41 -0800


Jackie-Jiang commented on code in PR #17342:
URL: https://github.com/apache/pinot/pull/17342#discussion_r2608855280



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -502,12 +506,18 @@ private void 
assignWorkersToNonPartitionedLeafFragment(DispatchablePlanMetadata
         metadata.addUnavailableSegments(tableName, 
routingTable.getUnavailableSegments());
       }
     }
+    // Sort server instances to ensure deterministic worker ID assignment.
+    // This is critical for pre-partitioned exchanges where worker ID N on one 
stage
+    // must map to the same physical server as worker ID N on another stage.
+    List<ServerInstance> sortedServers = new 
ArrayList<>(serverInstanceToSegmentsMap.keySet());

Review Comment:
   (minor) Consider putting entry to the list to avoid unnecessary map lookup, 
same for the other place



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -362,6 +362,10 @@ private List<QueryServerInstance> 
getCandidateServers(DispatchablePlanContext co
     } else {
       candidateServers = getCandidateServersPerTables(context);
     }
+    // Sort to ensure deterministic worker ID assignment across stages.

Review Comment:
   For singleton instance case, do we need to pick the same server to execute 
the stage?



##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -502,12 +506,18 @@ private void 
assignWorkersToNonPartitionedLeafFragment(DispatchablePlanMetadata
         metadata.addUnavailableSegments(tableName, 
routingTable.getUnavailableSegments());
       }
     }
+    // Sort server instances to ensure deterministic worker ID assignment.
+    // This is critical for pre-partitioned exchanges where worker ID N on one 
stage
+    // must map to the same physical server as worker ID N on another stage.
+    List<ServerInstance> sortedServers = new 
ArrayList<>(serverInstanceToSegmentsMap.keySet());
+    sortedServers.sort(Comparator.comparing(ServerInstance::getInstanceId));
+
     int workerId = 0;
     Map<Integer, QueryServerInstance> workerIdToServerInstanceMap = new 
HashMap<>();
     Map<Integer, Map<String, List<String>>> workerIdToSegmentsMap = new 
HashMap<>();
-    for (Map.Entry<ServerInstance, Map<String, List<String>>> entry : 
serverInstanceToSegmentsMap.entrySet()) {
-      workerIdToServerInstanceMap.put(workerId, new 
QueryServerInstance(entry.getKey()));
-      workerIdToSegmentsMap.put(workerId, entry.getValue());
+    for (ServerInstance serverInstance : sortedServers) {

Review Comment:
   (minor) Do a for loop on `numWorkers = sortedServers.size()` is more 
intuitive after the change



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Sort servers in WorkerManager to ensure deterministic workerId <-> server mapping across stages in an MSE query [pinot]

Reply via email to