[ https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094692#comment-17094692 ]
Adrien Grand commented on LUCENE-9087: -------------------------------------- +1 I like the idea of making the 1D and multi-dimensional cases more consistent, I suspect that it will also make it easier to reason about the compression that we can apply within a leaf, or make it possible to use more efficient decoding techniques that are more similar to what we do in postings. I also support the idea of moving to 512 points per leaf. Your analysis suggests it's a better trade-off. Given how we try to build balanced trees today, you might end up with between 513 and 1024 points per leaf , so it wouldn't be a risky change. > Should the BKD tree use a fixed maxPointsInLeafNode? > ----------------------------------------------------- > > Key: LUCENE-9087 > URL: https://issues.apache.org/jira/browse/LUCENE-9087 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Ignacio Vera > Priority: Major > Attachments: Study of BKD tree performance with different values for > max points per leaf.pdf > > Time Spent: 10m > Remaining Estimate: 0h > > Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the > constructor. For the current default codec the value is set to 1024. This is > a good compromise between memory usage and performance of the BKD tree. > Lowering this value can increase search performance but it has a penalty in > memory usage. Now that the BKD tree can be load off-heap, this can be less of > a concern. Note that lowering too much that value can hurt performance as > well as the tree becomes too deep and benefits are gone. > For data types that use the tree as an effective R-tree (ranges and shapes > datatypes) the benefits are larger as it can minimise the overlap between > leaf nodes. > Finally, creating too many leaf nodes can be dangerous at write time as > memory usage depends on the number of leaf nodes created. The writer creates > a long array of length = numberOfLeafNodes. > What I am wondering here is if we can improve this situation in order to > create the most efficient tree? My current ideas are: > > * We can adapt the points per leaf depending on that number so we create a > tree with the best depth and best points per leaf. Note that for the for 1D > case we have an upper estimation of the number of points that the tree will > be indexing. > * Add a mechanism so field types can easily define their best points per > leaf. In the case, field types like ranges or shapes can define its own value > to minimise overlap. > * Maybe the default is just too high now that we can load the tree off-heap. > > Any thoughts? > > > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org