github-actions[bot] commented on code in PR #65129:
URL: https://github.com/apache/doris/pull/65129#discussion_r3523060828


##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/ChildOutputPropertyDeriver.java:
##########
@@ -525,6 +606,23 @@ public PhysicalProperties 
visitPhysicalSetOperation(PhysicalSetOperation setOper
         for (int offset : offsetsOfFirstChild) {
             request.add(setOperation.getOutput().get(offset).getExprId());
         }
+        // The offsets / shuffle-type comparison above does not prove that two 
STORAGE_BUCKETED
+        // children use the same bucket function: children distributed by 
different tables'
+        // bucket layouts hash the same output columns differently, and 
satisfy() of the
+        // returned property does not check the storage layout. Only claim 
STORAGE_BUCKETED
+        // when every child shares the same storage layout; otherwise return a 
non-specific
+        // property so the parent re-aligns the data.
+        if (firstType == ShuffleType.STORAGE_BUCKETED) {
+            DistributionSpecHash firstHash = (DistributionSpecHash) 
childrenDistribution.get(0);
+            for (int i = 1; i < childrenDistribution.size(); i++) {
+                DistributionSpecHash otherHash = (DistributionSpecHash) 
childrenDistribution.get(i);
+                if (otherHash.getTableId() != firstHash.getTableId()
+                        || otherHash.getSelectedIndexId() != 
firstHash.getSelectedIndexId()
+                        || 
!otherHash.getPartitionIds().equals(firstHash.getPartitionIds())) {
+                    return PhysicalProperties.createAnyFromHash(firstHash);
+                }
+            }
+        }
         return PhysicalProperties.createHash(request, firstType);

Review Comment:
   The fallback still erases the storage layout after it has proved the layouts 
match. Lines 615-624 compare `tableId`, `selectedIndexId`, and `partitionIds`, 
but this return calls `PhysicalProperties.createHash(request, firstType)`, 
which builds a `DistributionSpecHash` with the default `tableId=-1`, 
`selectedIndexId=-1`, and empty partitions.
   
   That is reachable when a child is bucketed by an equivalent slot rather than 
the slot directly output by the set operation. For example, a broadcast/hash 
join can keep `orderedShuffledColumns` from `a.id` while the projection feeding 
the set op outputs the equivalent `b.id`; the direct-position proof above falls 
through, but the generic loop succeeds through `exprIdToEquivalenceSet`. The 
lower set op then reports `STORAGE_BUCKETED` with no storage layout. If two 
such lower set ops from different tables feed an outer set op, both now have 
the same erased layout (`-1/-1/empty`), so the outer same-layout check can 
incorrectly treat them as bucket-aligned and skip the realignment exchange.
   
   Please preserve the validated layout when returning the `STORAGE_BUCKETED` 
property from this fallback, or downgrade to a non-storage/execution property 
instead of returning a layout-less storage bucket property.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to