bziobrowski opened a new pull request, #14727: URL: https://github.com/apache/pinot/pull/14727
PR adds following to MSQE engine: - `group_trim_size` hint - that enables trimming at aggregate operator stage if both order by and limit are available (currently requires using `is_enable_group_trim` hint). Note: `is_enable_group_trim` also enables v1-style leaf-stage group by results trimming. See [grouping algorithm documentation](https://docs.pinot.apache.org/users/user-guide-query/query-syntax/grouping-algorithm ) for details. - `error_or_num_groups_limit` hint or `errorOnNumGroupsLimit` query option - throws exception when num_groups_limit is reached in aggregate operator instead of setting a metadata flag Examples: - enable group by trimming in MSQE intermediate stage: Query: ```sql select /*+ aggOptions(is_enable_group_trim='true',num_groups_limit='50') */ i, j, count(*) as cnt from tab group by i, j order by i, j desc limit 5 ``` Execution plan: ` LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], offset=[0], fetch=[5]) PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1 DESC]], ...) LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[DESC], fetch=[5]) PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL]...) <-- trimming happens here PinotLogicalExchange(distribution=[hash[0, 1]]) LeafStageCombineOperator(table=[mytable]) StreamingInstanceResponse CombineGroupBy GroupBy(groupKeys=[[i, j]], aggregations=[[count(*)]]) Project(columns=[[i, j]]) DocIdSet(maxDocs=[40000]) FilterMatchEntireSegment(numDocs=[80]) ` - enable group by trimming in MSQE leaf and intermediate stage: Query: ```sql select /*+ aggOptions(is_enable_group_trim='true',group_trim_size='3') */ t1.i, t1.j, count(*) as cnt from tab t1 join tab t2 on 1=1 group by t1.i, t1.j order by t1.i asc, t1.j asc limit 5 ``` Execution plan: ` Execution plan: LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], offset=[0], fetch=[5]) PinotLogicalSortExchange(distribution=[hash], collation=[[0, 1]], isSortOnSender=[false], " isSortOnReceiver=[true]) LogicalSort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC], fetch=[5]) PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT($2)], aggType=[FINAL], ...) <-- trimming happens here PinotLogicalExchange(distribution=[hash[0, 1]]) PinotLogicalAggregate(group=[{0, 1}], agg#0=[COUNT()], aggType=[LEAF], ...) <-- trimming happens here LogicalJoin(condition=[true], joinType=[inner]) PinotLogicalExchange(distribution=[random]) LeafStageCombineOperator(table=[mytable]) StreamingInstanceResponse StreamingCombineSelect SelectStreaming(table=[mytable], totalDocs=[80]) Project(columns=[[i, j]]) DocIdSet(maxDocs=[40000]) FilterMatchEntireSegment(numDocs=[80]) PinotLogicalExchange(distribution=[broadcast]) LeafStageCombineOperator(table=[mytable]) StreamingInstanceResponse StreamingCombineSelect SelectStreaming(table=[mytable], totalDocs=[80]) Transform(expressions=[['0']]) Project(columns=[[]]) DocIdSet(maxDocs=[40000]) FilterMatchEntireSegment(numDocs=[80]) ` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org