Jackie-Jiang opened a new pull request, #11557: URL: https://github.com/apache/pinot/pull/11557
#10254 introduced the sort then merge sort algorithm to handle large in clause. It has time complexity of `O(m + n)` with `m` in clause values and `n` dictionary size. If we do not count the fixed sorting cost (amortized because it is done only once), comparing to binary search with complexity of `O(mlogn)`, it can perform much worse when `n >> m`. This PR optimizes the algorithm to do divide and concur: we binary search the middle value of the in clause, then divide the search into 2 parts, each with half of the in values and dictionary range. Overall the time complexity should be within `O(m)` (when m is large) to `O(mlogn)` (when m is small), but always better than binary search, and at least on par with the merge sort. Removed the old query option: `inPredicateSortThreshold` Added 2 query options: - `inPredicatePreSorted`: indicate that the given in clause is already sorted - `skipInPredicateSort`: do not sort the in clause values and use vanilla binary search -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org