jtao15 commented on issue #6189:
URL: 
https://github.com/apache/incubator-pinot/issues/6189#issuecomment-724346637


   For time range pruning on broker we have two solutions:
   
   1. **Naive O(n)**
   Currently, for each table, brokers keep track of all its online segments. 
PartitionSegmentPruner maintains a mapping from segment names to partition 
info. Same as partition pruning, we can have a time range pruner which keeps a 
mapping from segment names to time range. Pruning is done by looping through 
online segments, getting their partition infos and filtering against query time 
range, this will cost O(n)(n is # of online segments).
   
       **Pros**: easy to implement
   **Cons**: long time to compute the segment pruning for tables with large 
number of segments
   
   2. **O(m*logn)** using some specialized data structure
   Instead of keeping the time range to segment mapping, we can have some 
specialized ordered data structure (e.g. interval search tree: a augmented 
balanced BST) optimized for searching intercepted ranges against query range. 
Pruning is done by searching all intercepted ranges from the interval search 
tree, and checking if  they are online. This will cost O(m*logn) (m is # of 
qualified segments).
   There’s two implementations: 
      
       **Implementation 1**: a concurrent augmented bst
      **Implementation 2**: a read only augmented bst, a new tree will be built 
and atomicly swapped with the old one when there’s an external view change or a 
segment metadata updates 
   
      **Pros**: reduced time to prune for large number of segments
   **Cons**:  
       * More data (1.5 - 2 * naive solution) stored on brokers because of  
additional    pointers and auxiliary info for tree node
       * For Implementation 1, hard to code (a research topic), and reading 
performance will be harmed due to locking
       * For Implementation 2, when there’s an external view change or segment 
metadata updates, the whole tree needs to be gced.
   
   Due to performance requirements and since the pruner is read heavy 
(thousands qps in production, and external view change happens few times a 
day), the solution 2 implementation 2 is considered a better approach.
   
    
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to