gortiz commented on issue #12949: URL: https://github.com/apache/pinot/issues/12949#issuecomment-2079590396
I was able to replicate a similar plan in MultiStageQuickStart using: ```sql select playerID, numberOfGamesAsBatter, playerName, runs, intentionalWalks, CAST(hits as BIGINT) as Zeit from baseballStats a join dimBaseballTeams b on a.teamID = b.teamID where not exists ( select 1 from baseballStats c where c.teamID = a.teamID and c.yearID < a.yearID) order by Zeit desc ``` Which generates ``` Execution Plan LogicalSort(sort0=[$5], dir0=[DESC]) PinotLogicalSortExchange(distribution=[hash], collation=[[5 DESC]], isSortOnSender=[false], isSortOnReceiver=[true]) LogicalProject(playerID=[$3], numberOfGamesAsBatter=[$2], playerName=[$4], runs=[$5], intentionalWalks=[$1], Zeit=[CAST($0):BIGINT NOT NULL]) LogicalFilter(condition=[IS NULL($11)]) LogicalJoin(condition=[AND(=($6, $9), =($7, $10))], joinType=[left]) PinotLogicalExchange(distribution=[hash[6, 7]]) LogicalJoin(condition=[=($6, $8)], joinType=[inner]) PinotLogicalExchange(distribution=[hash[6]]) LogicalProject(hits=[$9], intentionalWalks=[$12], numberOfGamesAsBatter=[$15], playerID=[$16], playerName=[$17], runs=[$19], teamID=[$25], yearID=[$27]) LogicalTableScan(table=[[default, baseballStats]]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(teamID=[$3]) LogicalTableScan(table=[[default, dimBaseballTeams]]) PinotLogicalExchange(distribution=[hash[0, 1]]) LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)]) PinotLogicalExchange(distribution=[hash[0, 1]]) LogicalAggregate(group=[{0, 1}], agg#0=[MIN($2)]) LogicalProject(teamID0=[$2], yearID0=[$3], $f0=[true]) LogicalJoin(condition=[AND(=($0, $2), <($1, $3))], joinType=[inner]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(teamID=[$25], yearID=[$27]) LogicalTableScan(table=[[default, baseballStats]]) PinotLogicalExchange(distribution=[hash[0]]) LogicalAggregate(group=[{0, 1}]) PinotLogicalExchange(distribution=[hash[0, 1]]) LogicalAggregate(group=[{0, 1}]) LogicalJoin(condition=[=($0, $2)], joinType=[inner]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(teamID=[$25], yearID=[$27]) LogicalTableScan(table=[[default, baseballStats]]) PinotLogicalExchange(distribution=[hash[0]]) LogicalProject(teamID=[$3]) LogicalTableScan(table=[[default, dimBaseballTeams]]) ``` But at lest with the numbers we have in this quickstart, the rows returned by this version are smaller than the number returned without the where. I'll take a look next week trying to understand the rules that have been applied to generate that plan. Meanwhile, can you try to use LAST_WITH_TIME instead? IICU your query, that should be semantically equivalent to: ``` select a.acctuniquesessionid, a.acctstatustype, a.ts, b.acctstatustype, b.ts, (cast(b.ts as BIGINT) - cast(a.ts as BIGINT)) / 1000 as Zeit from ( select acctuniquesessionid, LAST_WITH_TIME(acctstatustype, ts, "STRING") as acctstatustype, MAX(ts) as ts from radius_start group by acctuniquesessionid ) as a join radius_stop b on a.acctuniquesessionid = b.acctuniquesessionid order by Zeit desc ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org