[ 
https://issues.apache.org/jira/browse/LUCENE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191638#comment-17191638
 ] 

Jim Ferenczi commented on LUCENE-9507:
--------------------------------------

Thanks for looking [~sokolov] , I was thinking that the merge policy would be 
just a thin wrapper that sort whichever segments are chosen. 

> Custom order for leaves in DirectoryReader, IndexWriter and searcher
> --------------------------------------------------------------------
>
>                 Key: LUCENE-9507
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9507
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Jim Ferenczi
>            Priority: Minor
>
> Now that we're able [to skip documents efficiently when sorting by a numeric 
> field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if 
> we could optimize sorted queries further by also sorting the leaf readers 
> based on the primary sort.
> For time-based indices in Elasticsearch, we've implemented an optimization 
> that does that at query time. If the query is sorted by a numeric docvalue 
> field, prior to search, we sort the leaves according to the query sort. When 
> sorting by timestamp this small optimization can have a big impact since 
> early termination can be reached much faster if the sort values in the 
> segments don't overlap too much. Applying this optimization at query time is 
> challenging , it has the benefit to work on any numeric field sort and order 
> but it requires to use a multi-reader that will reorganize the segments. It 
> can also be deceptive that after a force merge to 1 segment sorted queries 
> may be slower since there is nothing to sort anymore.
> So, another option that I look at is to add the ability to provide a leaf 
> order directly in the IndexWriter and DirectoryReader. That could be similar 
> to an index sort or even complementary to it since sorting segments based on 
> the index sort could also help at query time. For time-based indices that 
> cannot afford index sorting but have lots of sorted queries on timestamp, 
> forcing the order of segments could speed up sorted queries significantly. 
> The advantage of forcing a single leaf sort in the writer/reader is that we 
> can also use it to influence the merges by putting the segments with the 
> highest value first. That would help with the case of indices that are merged 
> to a single segment but would like to keep the sorted queries fast but also 
> for the multi-segments case since big segments would have more chance to have 
> highest values first too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to