[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?

Adrien Grand (Jira) Tue, 28 Apr 2020 10:04:23 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094692#comment-17094692
 ]


Adrien Grand commented on LUCENE-9087:
--------------------------------------

+1

I like the idea of making the 1D and multi-dimensional cases more consistent, I 
suspect that it will also make it easier to reason about the compression that 
we can apply within a leaf, or make it possible to use more efficient decoding 
techniques that are more similar to what we do in postings.

I also support the idea of moving to 512 points per leaf. Your analysis 
suggests it's a better trade-off. Given how we try to build balanced trees 
today, you might end up with between 513 and 1024 points per leaf , so it 
wouldn't be a risky change.

> Should the BKD tree use a fixed maxPointsInLeafNode? 
> -----------------------------------------------------
>
>                 Key: LUCENE-9087
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9087
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Ignacio Vera
>            Priority: Major
>         Attachments: Study of BKD tree performance with different values for 
> max points per leaf.pdf
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
> constructor. For the current default codec the value is set to 1024. This is 
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in 
> memory usage. Now that the BKD tree can be load off-heap, this can be less of 
> a concern. Note that lowering too much that value can hurt performance as 
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes 
> datatypes) the benefits are larger as it can minimise the overlap between 
> leaf nodes. 
> Finally, creating too many leaf nodes can be dangerous at write time as 
> memory usage depends on the number of leaf nodes created. The writer creates 
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to 
> create the most efficient tree? My current ideas are:
>  
>  * We can adapt the points per leaf depending on that number so we create a 
> tree with the best depth and best points per leaf. Note that for the for 1D 
> case we have an upper estimation of the number of points that the tree will 
> be indexing. 
>  * Add a mechanism so field types can easily define their best points per 
> leaf. In the case, field types like ranges or shapes can define its own value 
> to minimise overlap.
>  * Maybe the default is just too high now that we can load the tree off-heap.
>  
> Any thoughts?
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?

Reply via email to