bziobrowski opened a new pull request, #14727:
URL: https://github.com/apache/pinot/pull/14727

   PR adds following to MSQE engine:
   - `group_trim_size` hint - that enables trimming at aggregate operator stage 
if both order by and limit are available (currently requires using 
`is_enable_group_trim` hint). Note: `is_enable_group_trim` also enables 
v1-style leaf-stage group by results trimming. See  [grouping algorithm 
documentation](https://docs.pinot.apache.org/users/user-guide-query/query-syntax/grouping-algorithm
 ) for details.
   - `error_or_num_groups_limit` hint or `errorOnNumGroupsLimit` query option - 
throws exception when num_groups_limit is reached in aggregate operator instead 
of setting a metadata flag
   
   Examples:
   - enable group by trimming in MSQE intermediate stage:
   Query:
   ```sql
   select /*+  aggOptions(is_enable_group_trim='true',num_groups_limit='50') */ 
i, j, count(*) as cnt
   from tab
   group by i, j
   order by i, j desc
   limit 5
   ```
   Execution plan:
   `
   LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], offset=[0], 
fetch=[5])
          PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1 
DESC]], ...)
              LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], 
fetch=[5])                  
                PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], 
aggType=[FINAL]...) <-- trimming happens here
                  PinotLogicalExchange(distribution=[hash[0, 1]])
                    LeafStageCombineOperator(table=[mytable])
                      StreamingInstanceResponse
                        CombineGroupBy
                          GroupBy(groupKeys=[[i, j]], aggregations=[[count(*)]])
                            Project(columns=[[i, j]])
                              DocIdSet(maxDocs=[40000])
                                FilterMatchEntireSegment(numDocs=[80])
   `
   
   - enable group by trimming in MSQE leaf and intermediate stage:
   Query:
   ```sql
   select /*+  aggOptions(is_enable_group_trim='true',group_trim_size='3') */ 
t1.i, t1.j, count(*) as cnt
    from tab t1
    join tab t2 on 1=1
    group by t1.i, t1.j
    order by t1.i asc, t1.j asc
    limit 5
   ```
   Execution plan:
   `
   Execution plan:
   LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], offset=[0], 
fetch=[5])
     PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1]], 
isSortOnSender=[false], "
   isSortOnReceiver=[true])
       LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[5])
         PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], 
aggType=[FINAL], ...) <-- trimming happens here
           PinotLogicalExchange(distribution=[hash[0, 1]])
             PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT()], 
aggType=[LEAF], ...) <-- trimming happens here
               LogicalJoin(condition=[true], joinType=[inner])
                 PinotLogicalExchange(distribution=[random])
                   LeafStageCombineOperator(table=[mytable])
                     StreamingInstanceResponse
                       StreamingCombineSelect
                         SelectStreaming(table=[mytable], totalDocs=[80])
                           Project(columns=[[i, j]])
                             DocIdSet(maxDocs=[40000])
                               FilterMatchEntireSegment(numDocs=[80])
                 PinotLogicalExchange(distribution=[broadcast])
                   LeafStageCombineOperator(table=[mytable])
                     StreamingInstanceResponse
                       StreamingCombineSelect
                         SelectStreaming(table=[mytable], totalDocs=[80])
                           Transform(expressions=[['0']])
                             Project(columns=[[]])
                               DocIdSet(maxDocs=[40000])
                                 FilterMatchEntireSegment(numDocs=[80])
   `
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to