Jackie-Jiang opened a new pull request, #11557:
URL: https://github.com/apache/pinot/pull/11557

   #10254 introduced the sort then merge sort algorithm to handle large in 
clause. It has time complexity of `O(m + n)` with `m` in clause values and `n` 
dictionary size. If we do not count the fixed sorting cost (amortized because 
it is done only once), comparing to binary search with complexity of 
`O(mlogn)`, it can perform much worse when `n >> m`.
   
   This PR optimizes the algorithm to do divide and concur: we binary search 
the middle value of the in clause, then divide the search into 2 parts, each 
with half of the in values and dictionary range. Overall the time complexity 
should be within `O(m)` (when m is large) to `O(mlogn)` (when m is small), but 
always better than binary search, and at least on par with the merge sort.
   
   Removed the old query option: `inPredicateSortThreshold`
   Added 2 query options:
   - `inPredicatePreSorted`: indicate that the given in clause is already sorted
   - `skipInPredicateSort`: do not sort the in clause values and use vanilla 
binary search


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to