[ https://issues.apache.org/jira/browse/LUCENE-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191638#comment-17191638 ]
Jim Ferenczi commented on LUCENE-9507: -------------------------------------- Thanks for looking [~sokolov] , I was thinking that the merge policy would be just a thin wrapper that sort whichever segments are chosen. > Custom order for leaves in DirectoryReader, IndexWriter and searcher > -------------------------------------------------------------------- > > Key: LUCENE-9507 > URL: https://issues.apache.org/jira/browse/LUCENE-9507 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Jim Ferenczi > Priority: Minor > > Now that we're able [to skip documents efficiently when sorting by a numeric > field|https://issues.apache.org/jira/browse/LUCENE-9280], I was wondering if > we could optimize sorted queries further by also sorting the leaf readers > based on the primary sort. > For time-based indices in Elasticsearch, we've implemented an optimization > that does that at query time. If the query is sorted by a numeric docvalue > field, prior to search, we sort the leaves according to the query sort. When > sorting by timestamp this small optimization can have a big impact since > early termination can be reached much faster if the sort values in the > segments don't overlap too much. Applying this optimization at query time is > challenging , it has the benefit to work on any numeric field sort and order > but it requires to use a multi-reader that will reorganize the segments. It > can also be deceptive that after a force merge to 1 segment sorted queries > may be slower since there is nothing to sort anymore. > So, another option that I look at is to add the ability to provide a leaf > order directly in the IndexWriter and DirectoryReader. That could be similar > to an index sort or even complementary to it since sorting segments based on > the index sort could also help at query time. For time-based indices that > cannot afford index sorting but have lots of sorted queries on timestamp, > forcing the order of segments could speed up sorted queries significantly. > The advantage of forcing a single leaf sort in the writer/reader is that we > can also use it to influence the merges by putting the segments with the > highest value first. That would help with the case of indices that are merged > to a single segment but would like to keep the sorted queries fast but also > for the multi-segments case since big segments would have more chance to have > highest values first too. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org