github-actions[bot] commented on code in PR #65129:
URL: https://github.com/apache/doris/pull/65129#discussion_r3523060828
##########
fe/fe-core/src/main/java/org/apache/doris/nereids/properties/ChildOutputPropertyDeriver.java:
##########
@@ -525,6 +606,23 @@ public PhysicalProperties
visitPhysicalSetOperation(PhysicalSetOperation setOper
for (int offset : offsetsOfFirstChild) {
request.add(setOperation.getOutput().get(offset).getExprId());
}
+ // The offsets / shuffle-type comparison above does not prove that two
STORAGE_BUCKETED
+ // children use the same bucket function: children distributed by
different tables'
+ // bucket layouts hash the same output columns differently, and
satisfy() of the
+ // returned property does not check the storage layout. Only claim
STORAGE_BUCKETED
+ // when every child shares the same storage layout; otherwise return a
non-specific
+ // property so the parent re-aligns the data.
+ if (firstType == ShuffleType.STORAGE_BUCKETED) {
+ DistributionSpecHash firstHash = (DistributionSpecHash)
childrenDistribution.get(0);
+ for (int i = 1; i < childrenDistribution.size(); i++) {
+ DistributionSpecHash otherHash = (DistributionSpecHash)
childrenDistribution.get(i);
+ if (otherHash.getTableId() != firstHash.getTableId()
+ || otherHash.getSelectedIndexId() !=
firstHash.getSelectedIndexId()
+ ||
!otherHash.getPartitionIds().equals(firstHash.getPartitionIds())) {
+ return PhysicalProperties.createAnyFromHash(firstHash);
+ }
+ }
+ }
return PhysicalProperties.createHash(request, firstType);
Review Comment:
The fallback still erases the storage layout after it has proved the layouts
match. Lines 615-624 compare `tableId`, `selectedIndexId`, and `partitionIds`,
but this return calls `PhysicalProperties.createHash(request, firstType)`,
which builds a `DistributionSpecHash` with the default `tableId=-1`,
`selectedIndexId=-1`, and empty partitions.
That is reachable when a child is bucketed by an equivalent slot rather than
the slot directly output by the set operation. For example, a broadcast/hash
join can keep `orderedShuffledColumns` from `a.id` while the projection feeding
the set op outputs the equivalent `b.id`; the direct-position proof above falls
through, but the generic loop succeeds through `exprIdToEquivalenceSet`. The
lower set op then reports `STORAGE_BUCKETED` with no storage layout. If two
such lower set ops from different tables feed an outer set op, both now have
the same erased layout (`-1/-1/empty`), so the outer same-layout check can
incorrectly treat them as bucket-aligned and skip the realignment exchange.
Please preserve the validated layout when returning the `STORAGE_BUCKETED`
property from this fallback, or downgrade to a non-storage/execution property
instead of returning a layout-less storage bucket property.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]