Jackie-Jiang commented on code in PR #16966:
URL: https://github.com/apache/pinot/pull/16966#discussion_r2411970162
##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/NullHandlingIntegrationTest.java:
##########
@@ -459,4 +459,45 @@ public Object[][] nullLiteralQueries() {
{String.format("SELECT tan(null) FROM %s", getTableName()), "null"}
};
}
+
+ /// This test ensures IS_TRUE can be trimmed off on leaf stage
+ @Test(dataProvider = "useBothQueryEngines")
+ public void testFilteredAggregationNoScanInFilter(boolean
useMultiStageQueryEngine)
+ throws Exception {
+ setUseMultiStageQueryEngine(useMultiStageQueryEngine);
+
+ String query = "SELECT city, COUNT(*), COUNT(*) FILTER(WHERE description =
'unknown') FROM mytable GROUP BY city";
+
+ if (useMultiStageQueryEngine) {
+ // MSE will insert IS_TRUE to the aggregate filter
+ explainLogical(query,
+ "Execution Plan\n"
+ + "PinotLogicalAggregate(group=[{0}], agg#0=[COUNT($1)],
agg#1=[COUNT($2)], aggType=[FINAL])\n"
+ + " PinotLogicalExchange(distribution=[hash[0]])\n"
+ + " PinotLogicalAggregate(group=[{0}], agg#0=[COUNT()],
agg#1=[COUNT() FILTER $1], aggType=[LEAF])\n"
+ + " LogicalProject(city=[$5], $f1=[IS TRUE(=($7,
_UTF-8'unknown'))])\n"
+ + " PinotLogicalTableScan(table=[[default, mytable]])\n");
+ // IS_TRUE should be trimmed off, then the filter becomes always false
in the server execution plan
+ explainAskingServers(query,
+ "Execution Plan\n"
+ + "PinotLogicalAggregate(group=[{0}], agg#0=[COUNT($1)],
agg#1=[COUNT($2)], aggType=[FINAL])\n"
+ + " PinotLogicalExchange(distribution=[hash[0]])\n"
+ + " LeafStageCombineOperator(table=[mytable])\n"
+ + " StreamingInstanceResponse\n"
+ + " CombineGroupBy\n"
+ + " GroupByFiltered(groupKeys=[[city]],
aggregations=[[count(*), count(*)]])\n"
+ + " Project(columns=[[city]])\n"
+ + " DocIdSet(maxDocs=[20000])\n"
+ + " FilterEmpty\n"
Review Comment:
It looks up the dictionary when creating the physical (server) execution
plan, and cannot find the value matching `'unknown'`. Without the special
handling in this PR, the filter remains a UDF (`IS_TRUE(description =
'unknown')`), and we need to scan all records in order to execute the query
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]