gortiz opened a new pull request, #14212:
URL: https://github.com/apache/pinot/pull/14212

   This PR fixes 2 non critical but annoying issues in multi-stage:
   
   ## Issue 1
   Plans from different servers were not correctly merged when segments for 
each server produced different plans. For example, in colocated join, the 
following query:
   
   ```sql
   EXPLAIN PLAN FOR
   SELECT DISTINCT deviceOS, groupUUID
   FROM userAttributes AS a
   JOIN userGroups AS g
   ON a.userUUID = g.userUUID
   WHERE g.groupUUID = 'group-1'
   LIMIT 100
   ```
   
   Produced:
   
   ```
   Execution Plan
   LogicalSort(offset=[0], fetch=[100])
     PinotLogicalSortExchange(distribution=[hash], collation=[[]], 
isSortOnSender=[false], isSortOnReceiver=[false])
       LogicalSort(fetch=[100])
         PinotLogicalAggregate(group=[{0, 1}])
           PinotLogicalExchange(distribution=[hash[0, 1]])
             PinotLogicalAggregate(group=[{0, 2}])
               LogicalJoin(condition=[=($1, $3)], joinType=[inner])
                 PinotLogicalExchange(distribution=[hash[1]])
                   LeafStageCombineOperator(table=[userAttributes])
                     StreamingInstanceResponse
                       StreamingCombineSelect(repeated=[4])
                         SelectStreaming(table=[userAttributes], 
totalDocs=[10000])
                           Project(columns=[[deviceOS, userUUID]])
                             DocIdSet(maxDocs=[40000])
                               FilterMatchEntireSegment(numDocs=[10000])
                 IntermediateCombine
                   Alternative(servers=[1])
                     PinotLogicalExchange(distribution=[hash[1]])
                       LeafStageCombineOperator(table=[userGroups])
                         StreamingInstanceResponse
                           StreamingCombineSelect
                             SelectStreaming(segment=[userGroups_OFFLINE_0], 
table=[userGroups], totalDocs=[7])
                               Project(columns=[[groupUUID, userUUID]])
                                 DocIdSet(maxDocs=[10000])
                                   FilterInvertedIndex(predicate=[groupUUID = 
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
                             SelectStreaming(segment=[userGroups_OFFLINE_4], 
table=[userGroups], totalDocs=[4])
                               Project(columns=[[groupUUID, userUUID]])
                                 DocIdSet(maxDocs=[10000])
                                   FilterEmpty
                             SelectStreaming(segment=[userGroups_OFFLINE_6], 
table=[userGroups], totalDocs=[4])
                               Project(columns=[[groupUUID, userUUID]])
                                 DocIdSet(maxDocs=[10000])
                                   FilterMatchEntireSegment(numDocs=[4])
                   Alternative(servers=[1])
                     PinotLogicalExchange(distribution=[hash[1]])
                       LeafStageCombineOperator(table=[userGroups])
                         StreamingInstanceResponse
                           StreamingCombineSelect(repeated=[4])
                             SelectStreaming(table=[userGroups], 
totalDocs=[2471])
                               Project(columns=[[groupUUID, userUUID]])
                                 DocIdSet(maxDocs=[40000])
                                   FilterInvertedIndex(predicate=[groupUUID = 
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
   ```
   
   While with these changes both alternatives are merged, producing the 
following explain:
   ```
   Execution Plan
   LogicalSort(offset=[0], fetch=[100])
     PinotLogicalSortExchange(distribution=[hash], collation=[[]], 
isSortOnSender=[false], isSortOnReceiver=[false])
       LogicalSort(fetch=[100])
         PinotLogicalAggregate(group=[{0, 1}])
           PinotLogicalExchange(distribution=[hash[0, 1]])
             PinotLogicalAggregate(group=[{0, 2}])
               LogicalJoin(condition=[=($1, $3)], joinType=[inner])
                 PinotLogicalExchange(distribution=[hash[1]])
                   LeafStageCombineOperator(table=[userAttributes])
                     StreamingInstanceResponse
                       StreamingCombineSelect
                         SelectStreaming(table=[userAttributes], 
totalDocs=[10000])
                           Project(columns=[[deviceOS, userUUID]])
                             DocIdSet(maxDocs=[40000])
                               FilterMatchEntireSegment(numDocs=[10000])
                 PinotLogicalExchange(distribution=[hash[1]])
                   LeafStageCombineOperator(table=[userGroups])
                     StreamingInstanceResponse
                       StreamingCombineSelect
                         SelectStreaming(table=[userGroups], totalDocs=[2478])
                           Project(columns=[[groupUUID, userUUID]])
                             DocIdSet(maxDocs=[50000])
                               FilterInvertedIndex(predicate=[groupUUID = 
'group-1'], indexLookUp=[inverted_index], operator=[EQ])
                         SelectStreaming(segment=[userGroups_OFFLINE_4], 
table=[userGroups], totalDocs=[4])
                           Project(columns=[[groupUUID, userUUID]])
                             DocIdSet(maxDocs=[10000])
                               FilterEmpty
                         SelectStreaming(segment=[userGroups_OFFLINE_6], 
table=[userGroups], totalDocs=[4])
                           Project(columns=[[groupUUID, userUUID]])
                             DocIdSet(maxDocs=[10000])
                               FilterMatchEntireSegment(numDocs=[4])
   ```
   
   Which is easier to read.
   
   ## Issue 2
   There was an error in how IDEMPOTENT and IGNORABLE attributes were merged, 
which ended up randomly including the `segment` attribute `SelectStreaming`. 
The expected behavior is that this attribute should only appear if there is a 
single plan for that segment. Before this fix, the attribute was removed when 
merging 2 plans with that attribute and different value, but was kept when 
merging a plan without the attribute with another with the attribute.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to