Jackie-Jiang commented on code in PR #17342:
URL: https://github.com/apache/pinot/pull/17342#discussion_r2615903697
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -362,6 +362,10 @@ private List<QueryServerInstance>
getCandidateServers(DispatchablePlanContext co
} else {
candidateServers = getCandidateServersPerTables(context);
}
+ // Sort to ensure deterministic worker ID assignment across stages.
Review Comment:
Yeah, this is more for future proof.
If we don't want the singleton to be deterministic, we can move the sort to
the caller side, given we don't need to sort for singleton case
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -502,14 +506,27 @@ private void
assignWorkersToNonPartitionedLeafFragment(DispatchablePlanMetadata
metadata.addUnavailableSegments(tableName,
routingTable.getUnavailableSegments());
}
}
- int workerId = 0;
+ // Sort server instances to ensure deterministic worker ID assignment.
+ // This is critical for pre-partitioned exchanges where worker ID N on one
stage
+ // must map to the same physical server as worker ID N on another stage.
+ List<Map.Entry<ServerInstance, Map<String, List<String>>>>
sortedServerInstanceToSegmentsMap =
+ new ArrayList<>(serverInstanceToSegmentsMap.entrySet());
+ sortedServerInstanceToSegmentsMap.sort(Comparator.comparing(entry ->
entry.getKey().getInstanceId()));
+
Map<Integer, QueryServerInstance> workerIdToServerInstanceMap = new
HashMap<>();
Review Comment:
(minor) Not introduced in this PR, but we may pre-size these maps
(`Maps.newHashMapWithExpectedSize()`)
##########
pinot-query-planner/src/main/java/org/apache/pinot/query/routing/WorkerManager.java:
##########
@@ -502,14 +506,27 @@ private void
assignWorkersToNonPartitionedLeafFragment(DispatchablePlanMetadata
metadata.addUnavailableSegments(tableName,
routingTable.getUnavailableSegments());
}
}
- int workerId = 0;
+ // Sort server instances to ensure deterministic worker ID assignment.
+ // This is critical for pre-partitioned exchanges where worker ID N on one
stage
+ // must map to the same physical server as worker ID N on another stage.
+ List<Map.Entry<ServerInstance, Map<String, List<String>>>>
sortedServerInstanceToSegmentsMap =
+ new ArrayList<>(serverInstanceToSegmentsMap.entrySet());
+ sortedServerInstanceToSegmentsMap.sort(Comparator.comparing(entry ->
entry.getKey().getInstanceId()));
+
Map<Integer, QueryServerInstance> workerIdToServerInstanceMap = new
HashMap<>();
Map<Integer, Map<String, List<String>>> workerIdToSegmentsMap = new
HashMap<>();
- for (Map.Entry<ServerInstance, Map<String, List<String>>> entry :
serverInstanceToSegmentsMap.entrySet()) {
- workerIdToServerInstanceMap.put(workerId, new
QueryServerInstance(entry.getKey()));
- workerIdToSegmentsMap.put(workerId, entry.getValue());
- workerId++;
+
+ // Assign 1 worker per server
+ for (int workerId = 0; workerId <
sortedServerInstanceToSegmentsMap.size(); workerId++) {
Review Comment:
(nit) Maybe not important for modern JVM, but I usually cache
`sortedServerInstanceToSegmentsMap.size()`. Same for the other place
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]