[GitHub] [pinot] jasperjiaguo commented on pull request #11557: Optimize in clause evaluation

via GitHub Wed, 13 Sep 2023 12:13:48 -0700


jasperjiaguo commented on PR #11557:
URL: https://github.com/apache/pinot/pull/11557#issuecomment-1718177293


   > > can we comment on the performance of this depending on the number of hits
   > > for example sorted values length (m) = 1000 dictionary length (n) = 1M
   > > What is the performance for each of the following strategy when we vary 
the hits from 0% match to 100% match.
   > > 
   > > * linear search (mlogn)
   > > * merge sort o(m+n)
   > > * this PR's divide and conquer o(m) to o(mlogn)
   > 
   > In this specific scenario:
   > 
   > * Regular binary search: O(1000log1M)
   > * Merge sort: ~= O(1M)
   > * Divide and conquer: O(log1M + 2 * log500k + 4 * log250k + ... + 512 * 
log2k) ~= O(1000log4k)
   @Jackie-Jiang  So the complexity of divide and conquer is O(log(n) + 2 
log(n/2) + ... + m/2 log(n/(m/2))) ~ O(m logn - m logm), so my guess is when m 
is at the order of sqrt(n), the user would see some benefits if they have 
pre-sorted entries in the in clause? If the entries are not pre-sorted then 
sorting it on user side (O(m logm)) is probably not necessary?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] jasperjiaguo commented on pull request #11557: Optimize in clause evaluation

Reply via email to