[GitHub] [pinot] Jackie-Jiang commented on issue #8837: `order by DESC limit N` queries are not optimised at segment level

GitBox Wed, 08 Jun 2022 10:22:03 -0700


Jackie-Jiang commented on issue #8837:
URL: https://github.com/apache/pinot/issues/8837#issuecomment-1150190308


   We are actually talking about the same thing. For descending order case, all 
values will be inserted into the priority queue once, then removed after new 
value comes.
   The confusion is from the "10K elements" because the cost of the current 
algorithm should not be related to the values per block. I see you are 
referring the cost of `SelectionOrderByOperator.computePartiallyOrdered()` 
which is per block, and I was referring the cost of processing the whole 
segment.
   
   - Document count: N (10k per block, millions per segment)
   - Query LIMIT: m
   - Time complexity of the algorithm: O(Nlogm)
   
   I agree this is very not efficient (actually the worst case scenario). We 
should be able to optimize the algorithm for both ascending and descending 
order case, and early terminate when enough documents are collected instead of 
processing all the blocks. For descending order case, we need to scan the 
matched docs in the reverse order


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

[GitHub] [pinot] Jackie-Jiang commented on issue #8837: `order by DESC limit N` queries are not optimised at segment level

Reply via email to