[
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16997283#comment-16997283
]
Ignacio Vera commented on LUCENE-9087:
--------------------------------------
Thanks [~jpountz] for the explanation.
> Should the BKD tree use a fixed maxPointsInLeafNode?
> -----------------------------------------------------
>
> Key: LUCENE-9087
> URL: https://issues.apache.org/jira/browse/LUCENE-9087
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Ignacio Vera
> Priority: Major
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the
> constructor. For the current default codec the value is set to 1200. This is
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in
> memory usage. Now that the BKD tree can be load off-heap, this can be less of
> a concern. Note that lowering too much that value can hurt performance as
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes
> datatypes) the benefits are larger as it can minimise the overlap between
> leaf nodes.
> Finally, creating too many leaf nodes can be dangerous at write time as
> memory usage depends on the number of leaf nodes created. The writer creates
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to
> create the most efficient tree? My current ideas are:
>
> * We can adapt the points per leaf depending on that number so we create a
> tree with the best depth and best points per leaf. Note that for the for 1D
> case we have an upper estimation of the number of points that the tree will
> be indexing.
> * Add a mechanism so field types can easily define their best points per
> leaf. In the case, field types like ranges or shapes can define its own value
> to minimise overlap.
> * Maybe the default is just too high now that we can load the tree off-heap.
>
> Any thoughts?
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]