github-actions[bot] commented on code in PR #65129:
URL: https://github.com/apache/doris/pull/65129#discussion_r3518546914


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/ChildOutputPropertyDeriver.java:
##########
@@ -440,53 +443,92 @@ public PhysicalProperties 
visitPhysicalSetOperation(PhysicalSetOperation setOper
             return PhysicalProperties.GATHER;
         }
 
-        // TODO: open comment when support `enable_local_shuffle_planner`
-        // int distributeToChildIndex
-        //         = 
setOperation.<Integer>getMutableState(PhysicalSetOperation.DISTRIBUTE_TO_CHILD_INDEX).orElse(-1);
-        // if (distributeToChildIndex >= 0
-        //         && childrenDistribution.get(distributeToChildIndex) 
instanceof DistributionSpecHash) {
-        //     DistributionSpecHash childDistribution
-        //             = (DistributionSpecHash) 
childrenDistribution.get(distributeToChildIndex);
-        //     List<SlotReference> childToIndex = 
setOperation.getRegularChildrenOutputs().get(distributeToChildIndex);
-        //     Map<ExprId, Integer> idToOutputIndex = new LinkedHashMap<>();
-        //     for (int j = 0; j < childToIndex.size(); j++) {
-        //         idToOutputIndex.put(childToIndex.get(j).getExprId(), j);
-        //     }
+        // When set-op bucket shuffle is chosen, the set operation keeps the 
basic child's bucket
+        // distribution as its own output so the bucket distribution 
propagates upward instead of
+        // being flattened to execution-bucketed. The basic child is 
recomputed from the children
+        // distributions instead of being carried as mutable planner state: 
mutable state does not
+        // survive the with-copies in chooseBestPlan() and the
+        // RecomputePhysicalPropertiesPostProcessor re-derivation, while the 
recomputation below is
+        // deterministic on any copy of the plan.
         //
-        //     List<ExprId> orderedShuffledColumns = 
childDistribution.getOrderedShuffledColumns();
-        //     List<ExprId> setOperationDistributeColumnIds = new 
ArrayList<>();
-        //     for (ExprId tableDistributeColumnId : orderedShuffledColumns) {
-        //         Integer index = 
idToOutputIndex.get(tableDistributeColumnId);
-        //         if (index == null) {
-        //             break;
-        //         }
-        //         
setOperationDistributeColumnIds.add(setOperation.getOutput().get(index).getExprId());
-        //     }
-        //     // check whether the set operation output all distribution 
columns of the child
-        //     if (setOperationDistributeColumnIds.size() == 
orderedShuffledColumns.size()) {
-        //         boolean isUnion = setOperation instanceof Union;
-        //         boolean shuffleToRight = distributeToChildIndex > 0;
-        //         if (!isUnion && shuffleToRight) {
-        //             return new PhysicalProperties(
-        //                     new DistributionSpecHash(
-        //                             setOperationDistributeColumnIds,
-        //                             ShuffleType.EXECUTION_BUCKETED
-        //                     )
-        //             );
-        //         } else {
-        //             // keep the distribution as the child
-        //             return new PhysicalProperties(
-        //                     new DistributionSpecHash(
-        //                             setOperationDistributeColumnIds,
-        //                             childDistribution.getShuffleType(),
-        //                             childDistribution.getTableId(),
-        //                             childDistribution.getSelectedIndexId(),
-        //                             childDistribution.getPartitionIds()
-        //                     )
-        //             );
-        //         }
-        //     }
-        // }
+        // The bucket-shuffle signature is structural: every child is 
hash-distributed by NATURAL
+        // or STORAGE_BUCKETED, and at least one child is STORAGE_BUCKETED (an 
enforced
+        // bucket-shuffle child always reports STORAGE_BUCKETED and shares the 
basic child's
+        // bucket layout). Requiring one STORAGE_BUCKETED child rejects the 
unaligned case where
+        // several children merely keep their own NATURAL distributions (e.g. 
union all without a
+        // distribution requirement). The basic child keeps its NATURAL 
distribution, so prefer
+        // the first NATURAL child and fall back to the first STORAGE_BUCKETED 
one.
+        int distributeToChildIndex = -1;
+        int firstNaturalIndex = -1;
+        int firstStorageBucketedIndex = -1;
+        boolean allChildrenBucketAligned = true;
+        for (int i = 0; i < childrenDistribution.size(); i++) {
+            DistributionSpec childDistributionSpec = 
childrenDistribution.get(i);
+            if (!(childDistributionSpec instanceof DistributionSpecHash)) {
+                allChildrenBucketAligned = false;
+                break;
+            }
+            ShuffleType childShuffleType = ((DistributionSpecHash) 
childDistributionSpec).getShuffleType();
+            if (childShuffleType == ShuffleType.NATURAL) {
+                if (firstNaturalIndex < 0) {
+                    firstNaturalIndex = i;
+                }
+            } else if (childShuffleType == ShuffleType.STORAGE_BUCKETED) {
+                if (firstStorageBucketedIndex < 0) {
+                    firstStorageBucketedIndex = i;
+                }
+            } else {
+                allChildrenBucketAligned = false;
+                break;
+            }
+        }
+        if (allChildrenBucketAligned && firstStorageBucketedIndex >= 0) {

Review Comment:
   This structural check can produce a hash/bucket property without proving 
that this set operation was actually planned through the bucket-shuffle 
alternative, or that it picked the same basic child as 
`ChildrenPropertiesRegulator`.
   
   One concrete path is `PhysicalUnion`: 
`RequestPropertyDeriver.visitPhysicalUnion` still adds the cheap `ANY` child 
request before the parent-hash request. For that alternative, the regulator 
takes the `DistributionSpecAny` branch and never chooses a set-op basic child 
or enforces the storage-bucketed child to the natural child's bucket layout. A 
union shaped like `OlapScan t1(id)` (`NATURAL`) plus a lower bucket-shuffle 
plan (`STORAGE_BUCKETED` on `id`, but using another table's buckets) reaches 
this block as `NATURAL + STORAGE_BUCKETED`; it returns a NATURAL hash property, 
and `DistributionSpecHash.satisfy(REQUIRE)` lets an upstream join/agg skip the 
realignment exchange.
   
   There is also a selected-basic mismatch when every child is already 
`STORAGE_BUCKETED`: the regulator can choose child 1 by row count, but this 
recomputation falls back to child 0. For `INTERSECT`/`EXCEPT`, the old 
selected-index logic would see `distributeToChildIndex > 0` and downgrade the 
output to `EXECUTION_BUCKETED`; the new fallback returns `STORAGE_BUCKETED` 
from child 0 instead. Please keep enough immutable basic-child information, or 
make the re-derivation prove it is looking at the exact realigned 
bucket-shuffle alternative; otherwise fall back to a 
non-specific/`EXECUTION_BUCKETED` property and add coverage for both shapes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to