[jira] [Commented] (LUCENE-9507) Custom order for leaves in DirectoryReader, IndexWriter and searcher

ASF subversion and git services (Jira) Fri, 30 Jul 2021 06:16:05 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17390571#comment-17390571
 ]


ASF subversion and git services commented on LUCENE-9507:
---------------------------------------------------------

Commit 1daf7e7c742cf53cb62a55bc3993a76d878e3223 in lucene's branch 
refs/heads/main from Mayya Sharipova
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=1daf7e7 ]

LUCENE-10027 provide leaf sorter from commit (#214)

Provide leaf sorter for directory readers opened from IndexCommit

LUCENE-9507 allowed to provide a leaf sorter for directory readers.
One API that was missed is to allow to provide a leaf sorter
for directory readers opened from an index commit.
This patch address this by adding an extra parameter: a custom
comparator for sorting leaf readers to the Directory reader open API
from indexCommit and minSupportedMajorVersion.

Relates to PR #32

> Custom order for leaves in DirectoryReader, IndexWriter and searcher
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9507
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9507
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Jim Ferenczi
>            Priority: Minor
>             Fix For: main (9.0), 8.9
>
>          Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Now that we're able [to skip documents efficiently when sorting by a numeric 
> field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if 
> we could optimize sorted queries further by also sorting the leaf readers 
> based on the primary sort.
> For time-based indices in Elasticsearch, we've implemented an optimization 
> that does that at query time. If the query is sorted by a numeric docvalue 
> field, prior to search, we sort the leaves according to the query sort. When 
> sorting by timestamp this small optimization can have a big impact since 
> early termination can be reached much faster if the sort values in the 
> segments don't overlap too much. Applying this optimization at query time is 
> challenging , it has the benefit to work on any numeric field sort and order 
> but it requires to use a multi-reader that will reorganize the segments. It 
> can also be deceptive that after a force merge to 1 segment sorted queries 
> may be slower since there is nothing to sort anymore.
> So, another option that I look at is to add the ability to provide a leaf 
> order directly in the IndexWriter and DirectoryReader. That could be similar 
> to an index sort or even complementary to it since sorting segments based on 
> the index sort could also help at query time. For time-based indices that 
> cannot afford index sorting but have lots of sorted queries on timestamp, 
> forcing the order of segments could speed up sorted queries significantly. 
> The advantage of forcing a single leaf sort in the writer/reader is that we 
> can also use it to influence the merges by putting the segments with the 
> highest value first. That would help with the case of indices that are merged 
> to a single segment but would like to keep the sorted queries fast but also 
> for the multi-segments case since big segments would have more chance to have 
> highest values first too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-9507) Custom order for leaves in DirectoryReader, IndexWriter and searcher

Reply via email to