walterddr opened a new issue, #9944: URL: https://github.com/apache/pinot/issues/9944
see: ``` SELECT count(LO_ORDERKEY) FROM lineorder JOIN customer ON LO_CUSTKEY = C_CUSTKEY WHERE LO_QUANTITY < 10 AND C_REGION = 'AFRICA' LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 17819.155154440043, id = 44195 LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 17818.030154440043, id = 44193 LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 17814.030154440043, id = 44188 LogicalJoin(condition=[=($7, $30)], joinType=[inner]): rowcount = 112.5, cumulative cost = 17812.905154440043, id = 44186 LogicalExchange(distribution=[hash[7]]): rowcount = 50.0, cumulative cost = 15798.092021712584, id = 44181 LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44173 LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44157 LogicalExchange(distribution=[hash[10]]): rowcount = 15.0, cumulative cost = 1902.3131327274587, id = 44182 LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44174 LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44159 ``` The explained plan requires leaf stages to return the entire collection of columns where as: ``` SELECT count(LO_ORDERKEY) FROM (SELECT LO_ORDERKEY, LO_CUSTKEY FROM lineorder WHERE LO_QUANTITY < 10) AS l JOIN (SELECT C_CUSTKEY FROM customer WHERE C_REGION = 'AFRICA') AS c ON l.LO_CUSTKEY = c.C_CUSTKEY LogicalAggregate(group=[{}], EXPR$0=[$SUM0($0)]): rowcount = 1.0, cumulative cost = 2176.042214237391, id = 44268 LogicalExchange(distribution=[hash]): rowcount = 1.0, cumulative cost = 2174.917214237391, id = 44266 LogicalAggregate(group=[{}], EXPR$0=[COUNT()]): rowcount = 1.0, cumulative cost = 2170.917214237391, id = 44261 LogicalJoin(condition=[=($1, $2)], joinType=[inner]): rowcount = 112.5, cumulative cost = 2169.792214237391, id = 44259 <-- NOTE: this is condition =($1, $2) LogicalExchange(distribution=[hash[1]]): rowcount = 50.0, cumulative cost = 1764.8092021712584, id = 44254 LogicalProject(LO_ORDERKEY=[$0], LO_CUSTKEY=[$7]): rowcount = 50.0, cumulative cost = 200.0, id = 44249 LogicalFilter(condition=[<($2, 10)]): rowcount = 50.0, cumulative cost = 150.0, id = 44239 LogicalTableScan(table=[[lineorder]]): rowcount = 100.0, cumulative cost = 100.0, id = 44198 LogicalExchange(distribution=[hash[0]]): rowcount = 15.0, cumulative cost = 292.4830120661326, id = 44255 LogicalProject(C_CUSTKEY=[$10]): rowcount = 15.0, cumulative cost = 130.0, id = 44251 LogicalFilter(condition=[=($7, 'AFRICA')]): rowcount = 15.0, cumulative cost = 115.0, id = 44244 LogicalTableScan(table=[[customer]]): rowcount = 100.0, cumulative cost = 100.0, id = 44202 ``` if we wrapped in sub-query, it properly put the LogicalProject in-place to only return the needed columns. This should be done automatically - if a count is only operating on one of the columns, no others should be returned via the exchange. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org