jasperjiaguo commented on PR #11557: URL: https://github.com/apache/pinot/pull/11557#issuecomment-1718177293
> > can we comment on the performance of this depending on the number of hits > > for example sorted values length (m) = 1000 dictionary length (n) = 1M > > What is the performance for each of the following strategy when we vary the hits from 0% match to 100% match. > > > > * linear search (mlogn) > > * merge sort o(m+n) > > * this PR's divide and conquer o(m) to o(mlogn) > > In this specific scenario: > > * Regular binary search: O(1000log1M) > * Merge sort: ~= O(1M) > * Divide and conquer: O(log1M + 2 * log500k + 4 * log250k + ... + 512 * log2k) ~= O(1000log4k) @Jackie-Jiang So the complexity of divide and conquer is O(log(n) + 2 log(n/2) + ... + m/2 log(n/(m/2))) ~ O(m logn - m logm), so my guess is when m is at the order of sqrt(n), the user would see some benefits if they have pre-sorted entries in the in clause? If the entries are not pre-sorted then sorting it on user side (O(m logm)) is probably not necessary? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org