xiangfu0 commented on PR #14758: URL: https://github.com/apache/pinot/pull/14758#issuecomment-2577004853
> Will this rule prevent us from doing: > ``` > SELECT ... FROM t1 WHERE t1.col IN (SELECT t2.col FROM t2) > ``` > We don't always want to pay overhead of distinct when we already know the value is unique, or there are very few duplicates. > > Even without distinct, the result is still correct, just not necessary the most efficient way to execute. For the example query, in order to achieve the desired query plan, we can write it as: > ``` > SELECT > distinctCount(a.userUUID), > a.deviceOS > FROM userAttributes a WHERE a.userUUID IN > ( > SELECT DISTINCT userUUID > FROM userGroups > WHERE groupUUID = 'group-1' > ) > GROUP BY a.deviceOS > ``` I agree with you on the rewrite part. I feel the main issue here is that the plan generated is a semi join without distinct. Even though the inner query has a distinct. Technically this should be controllable or we may use a hint to turn on/off the rule? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org