gortiz commented on code in PR #14507: URL: https://github.com/apache/pinot/pull/14507#discussion_r1900919744
########## pinot-query-planner/src/main/java/org/apache/pinot/query/planner/physical/colocated/GreedyShuffleRewriteVisitor.java: ########## @@ -209,24 +209,43 @@ public Set<ColocationKey> visitMailboxSend(MailboxSendNode node, GreedyShuffleRe boolean canSkipShuffleBasic = colocationKeyCondition(oldColocationKeys, distributionKeys); // If receiver is not a join-stage, then we can determine distribution type now. - if (!context.isJoinStage(node.getReceiverStageId())) { + Iterable<Integer> receiverStageIds = node.getReceiverStageIds(); + if (noneIsJoin(receiverStageIds, context)) { Set<ColocationKey> colocationKeys; - if (canSkipShuffleBasic && areServersSuperset(node.getReceiverStageId(), node.getStageId())) { + if (canSkipShuffleBasic && allAreSuperSet(receiverStageIds, node)) { Review Comment: Otherwise we cannot apply the shuffle optimization. This means that if we find two stages that are equivalent but one can be optimized with colocated join while the other cannot, we need to decide whether we want to apply spool or colocated. Which one is better? I'm not sure. Probably we will need data to understand the difference. In theory if we don't apply spooling, we are going to end up executing the sender stage twice. In one of them we are going to skip the shuffle, but in the other we are going to shuffle anyway. Therefore the asymptotic cost will be the same. If we apply spooling, the same amount of data will be shuffled but we would end up doing less work because the sender stage would be executed only once. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org