richardstartin commented on issue #8837:
URL: https://github.com/apache/pinot/issues/8837#issuecomment-1149520182

   
   > > All 10k records will be inserted into the PQ, regardless of whether the 
insertion results in the removal of another or not. 9990 records will be 
removed from the PQ. This is clearly wasteful.
   > 
   > Not sure if we are referring to the same thing. In 
`SelectionOperatorUtils.addToPriorityQueue()`, we insert the value into the 
query only when it is larger than the top value. The PQ size will be up to 10 
all the time
   
   I think the communication gap here is because you may have missed that the 
scope if this issue is strictly _descending order over sorted columns_ so you 
have jumped to assuming I don't understand the algorithm.
   
   _When the values are sorted and the comparator is descending_, the next 
value is always greater than the smallest element in the PQ. This means the 
next value always added, and the smallest value is removed after the limit is 
reached. E.g. imagine the limit is 2, this table shows the state of the PQ for 
each docId.
   
   | docId | Value | PQ                              |
   | ------ | ------ | ----------------------- |  
   | 0        | 10      | [10]                            |
   | 1        | 15      | [10, 15]                      |
   | 2        | 30      | [15, 30]                      |
   | 3        | 400   |  [30, 400]                    |
   
   The PQ never grows beyond the size of 2, but that claim hasn't actually been 
made. After the first two values, for descending order over a sorted column, 
the second condition below is always true. Every value is added to the PQ once, 
and all but the last 2 values are eventually removed from the PQ. 
   
   ```java
     public static <T> void addToPriorityQueue(T value, PriorityQueue<T> queue, 
int maxNumValues) {
       if (queue.size() < maxNumValues) {
         queue.add(value);
       } else if (queue.comparator().compare(queue.peek(), value) < 0) {
         queue.poll();
         queue.offer(value);
       }
     }
   ```
   
   That is, for each block, the algorithm _"inserts up to 10k elements into a 
priority queue to discard all but the last 10"_.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to