gortiz opened a new pull request, #14212: URL: https://github.com/apache/pinot/pull/14212
This PR fixes 2 non critical but annoying issues in multi-stage: ## Issue 1 Plans from different servers were not correctly merged when segments for each server produced different plans. For example, in colocated join, the following query: ```sql EXPLAIN PLAN FOR SELECT DISTINCT deviceOS, groupUUID FROM userAttributes AS a JOIN userGroups AS g ON a.userUUID = g.userUUID WHERE g.groupUUID = 'group-1' LIMIT 100 ``` Produced: ``` Execution Plan LogicalSort(offset=[0], fetch=[100]) PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false]) LogicalSort(fetch=[100]) PinotLogicalAggregate(group=[{0, 1}]) PinotLogicalExchange(distribution=[hash[0, 1]]) PinotLogicalAggregate(group=[{0, 2}]) LogicalJoin(condition=[=($1, $3)], joinType=[inner]) PinotLogicalExchange(distribution=[hash[1]]) LeafStageCombineOperator(table=[userAttributes]) StreamingInstanceResponse StreamingCombineSelect(repeated=[4]) SelectStreaming(table=[userAttributes], totalDocs=[10000]) Project(columns=[[deviceOS, userUUID]]) DocIdSet(maxDocs=[40000]) FilterMatchEntireSegment(numDocs=[10000]) IntermediateCombine Alternative(servers=[1]) PinotLogicalExchange(distribution=[hash[1]]) LeafStageCombineOperator(table=[userGroups]) StreamingInstanceResponse StreamingCombineSelect SelectStreaming(segment=[userGroups_OFFLINE_0], table=[userGroups], totalDocs=[7]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[10000]) FilterInvertedIndex(predicate=[groupUUID = 'group-1'], indexLookUp=[inverted_index], operator=[EQ]) SelectStreaming(segment=[userGroups_OFFLINE_4], table=[userGroups], totalDocs=[4]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[10000]) FilterEmpty SelectStreaming(segment=[userGroups_OFFLINE_6], table=[userGroups], totalDocs=[4]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[10000]) FilterMatchEntireSegment(numDocs=[4]) Alternative(servers=[1]) PinotLogicalExchange(distribution=[hash[1]]) LeafStageCombineOperator(table=[userGroups]) StreamingInstanceResponse StreamingCombineSelect(repeated=[4]) SelectStreaming(table=[userGroups], totalDocs=[2471]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[40000]) FilterInvertedIndex(predicate=[groupUUID = 'group-1'], indexLookUp=[inverted_index], operator=[EQ]) ``` While with these changes both alternatives are merged, producing the following explain: ``` Execution Plan LogicalSort(offset=[0], fetch=[100]) PinotLogicalSortExchange(distribution=[hash], collation=[[]], isSortOnSender=[false], isSortOnReceiver=[false]) LogicalSort(fetch=[100]) PinotLogicalAggregate(group=[{0, 1}]) PinotLogicalExchange(distribution=[hash[0, 1]]) PinotLogicalAggregate(group=[{0, 2}]) LogicalJoin(condition=[=($1, $3)], joinType=[inner]) PinotLogicalExchange(distribution=[hash[1]]) LeafStageCombineOperator(table=[userAttributes]) StreamingInstanceResponse StreamingCombineSelect SelectStreaming(table=[userAttributes], totalDocs=[10000]) Project(columns=[[deviceOS, userUUID]]) DocIdSet(maxDocs=[40000]) FilterMatchEntireSegment(numDocs=[10000]) PinotLogicalExchange(distribution=[hash[1]]) LeafStageCombineOperator(table=[userGroups]) StreamingInstanceResponse StreamingCombineSelect SelectStreaming(table=[userGroups], totalDocs=[2478]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[50000]) FilterInvertedIndex(predicate=[groupUUID = 'group-1'], indexLookUp=[inverted_index], operator=[EQ]) SelectStreaming(segment=[userGroups_OFFLINE_4], table=[userGroups], totalDocs=[4]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[10000]) FilterEmpty SelectStreaming(segment=[userGroups_OFFLINE_6], table=[userGroups], totalDocs=[4]) Project(columns=[[groupUUID, userUUID]]) DocIdSet(maxDocs=[10000]) FilterMatchEntireSegment(numDocs=[4]) ``` Which is easier to read. ## Issue 2 There was an error in how IDEMPOTENT and IGNORABLE attributes were merged, which ended up randomly including the `segment` attribute `SelectStreaming`. The expected behavior is that this attribute should only appear if there is a single plan for that segment. Before this fix, the attribute was removed when merging 2 plans with that attribute and different value, but was kept when merging a plan without the attribute with another with the attribute. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org