[
https://issues.apache.org/jira/browse/LUCENE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390571#comment-17390571
]
ASF subversion and git services commented on LUCENE-9507:
---------------------------------------------------------
Commit 1daf7e7c742cf53cb62a55bc3993a76d878e3223 in lucene's branch
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1daf7e7 ]
LUCENE-10027 provide leaf sorter from commit (#214)
Provide leaf sorter for directory readers opened from IndexCommit
LUCENE-9507 allowed to provide a leaf sorter for directory readers.
One API that was missed is to allow to provide a leaf sorter
for directory readers opened from an index commit.
This patch address this by adding an extra parameter: a custom
comparator for sorting leaf readers to the Directory reader open API
from indexCommit and minSupportedMajorVersion.
Relates to PR #32
> Custom order for leaves in DirectoryReader, IndexWriter and searcher
> --------------------------------------------------------------------
>
> Key: LUCENE-9507
> URL: https://issues.apache.org/jira/browse/LUCENE-9507
> Project: Lucene - Core
> Issue Type: New Feature
> Reporter: Jim Ferenczi
> Priority: Minor
> Fix For: main (9.0), 8.9
>
> Time Spent: 5h 50m
> Remaining Estimate: 0h
>
> Now that we're able [to skip documents efficiently when sorting by a numeric
> field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if
> we could optimize sorted queries further by also sorting the leaf readers
> based on the primary sort.
> For time-based indices in Elasticsearch, we've implemented an optimization
> that does that at query time. If the query is sorted by a numeric docvalue
> field, prior to search, we sort the leaves according to the query sort. When
> sorting by timestamp this small optimization can have a big impact since
> early termination can be reached much faster if the sort values in the
> segments don't overlap too much. Applying this optimization at query time is
> challenging , it has the benefit to work on any numeric field sort and order
> but it requires to use a multi-reader that will reorganize the segments. It
> can also be deceptive that after a force merge to 1 segment sorted queries
> may be slower since there is nothing to sort anymore.
> So, another option that I look at is to add the ability to provide a leaf
> order directly in the IndexWriter and DirectoryReader. That could be similar
> to an index sort or even complementary to it since sorting segments based on
> the index sort could also help at query time. For time-based indices that
> cannot afford index sorting but have lots of sorted queries on timestamp,
> forcing the order of segments could speed up sorted queries significantly.
> The advantage of forcing a single leaf sort in the writer/reader is that we
> can also use it to influence the merges by putting the segments with the
> highest value first. That would help with the case of indices that are merged
> to a single segment but would like to keep the sorted queries fast but also
> for the multi-segments case since big segments would have more chance to have
> highest values first too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]