github-actions[bot] commented on code in PR #61988:
URL: https://github.com/apache/doris/pull/61988#discussion_r3019848501


##########
fe/fe-core/src/main/java/org/apache/doris/qe/runtime/ThriftPlansBuilder.java:
##########
@@ -411,11 +411,16 @@ private static TPipelineFragmentParams 
fragmentToThriftIfAbsent(
             params.setFileScanParams(fileScanRangeParamsMap);
 
             if (fragmentPlan.getFragmentJob() instanceof 
UnassignedScanBucketOlapTableJob) {
-                int bucketNum = ((UnassignedScanBucketOlapTableJob) 
fragmentPlan.getFragmentJob())
+                int physicalBucketNum = ((UnassignedScanBucketOlapTableJob) 
fragmentPlan.getFragmentJob())
                         .getOlapScanNodes()
                         .get(0)
                         .getBucketNum();
-                params.setNumBuckets(bucketNum);
+                // When local shuffle creates more instances than physical 
buckets,
+                // virtual bucket indexes are assigned beyond the physical 
range.
+                // Expand num_buckets to cover all virtual buckets so that
+                // hash % num_buckets can route data to every instance.
+                int workerInstanceNum = instancesPerWorker.get(worker).size();
+                params.setNumBuckets(Math.max(physicalBucketNum, 
workerInstanceNum));

Review Comment:
   This FE-side `num_buckets` expansion is not enough for the distributed case. 
`computeBucketIdToInstanceId()` still sends only this worker's sparse physical 
bucket ids plus any virtual ids from `assignLocalShuffleJobs()`, while BE 
hashes rows modulo `num_buckets` and requires every produced bucket id to be 
present in that map (`ShuffleExchanger::_split_rows()` returns `Rows 
mismatched! Data may be lost.` otherwise).
   
   A concrete failure case is a 16-bucket table where one worker only scans 
buckets `{3, 11}` but local shuffle wants 8 instances: after this change the 
worker still does not publish a complete destination map for the expanded 
bucket space, so some hash results have no target instance. This only works 
when a worker already owns the full dense bucket range locally.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to