Re: [PR] Add a new rule PinotSeminJoinDistinctProjectRule to apply a distinct to a semi join right side project [pinot]

via GitHub Mon, 06 Jan 2025 20:07:25 -0800


Jackie-Jiang commented on PR #14758:
URL: https://github.com/apache/pinot/pull/14758#issuecomment-2574352116


   Will this rule prevent us from doing:
   ```
   SELECT ... FROM t1 WHERE t1.col IN (SELECT t2.col FROM t2)
   ```
   We don't always want to pay overhead of distinct when we already know the 
value is unique, or there are very few duplicates.
   
   Even without distinct, the result is still correct, just not necessary the 
most efficient way to execute. For the example query, in order to achieve the 
desired query plan, we can write it as:
   ```
   SELECT
     distinctCount(a.userUUID),
     a.deviceOS
   FROM userAttributes a WHERE a.userUUID IN
     (
       SELECT DISTINCT userUUID
       FROM userGroups
       WHERE groupUUID = 'group-1'
     )
   GROUP BY a.deviceOS
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Re: [PR] Add a new rule PinotSeminJoinDistinctProjectRule to apply a distinct to a semi join right side project [pinot]

Reply via email to