msokolov commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1668890994
########## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ########## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext> leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + + // TODO this is a temporary hack to force testing against multiple leaf reader context slices. + // It must be reverted before merging. + maxDocsPerSlice = 1; + maxSegmentsPerSlice = 1; + // end hack + // Make a copy so we can sort: List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: - Collections.sort( - sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); + sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); - final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>(); - long docSum = 0; - List<LeafReaderContext> group = null; + final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new ArrayList<>(); + int currentSliceNumDocs = 0; + List<LeafReaderContextPartition> group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; - groupedLeaves.add(Collections.singletonList(ctx)); + // if the segment does not fit in a single slice, we split it in multiple partitions of Review Comment: I had worked up a version of this where I modified LeafReaderContext/IndexReaderContext to create a new kind of context that models the range within a segment. I had added interval start/end to LRC, but I suspect a cleaner way would be to make a new thing (IntervalReaderContext or so) and then change APIs to expect IndexReaderContext instead of CompositeReaderContext? If we do it this way it might make it easier to handle some cases like the single-threaded execution you mentioned. But this is more about cleaning up the APIs than making it work and we can argue endlessly about what is neater, so I think your approach to delay such questions makes sense. ########## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ########## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext> leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + + // TODO this is a temporary hack to force testing against multiple leaf reader context slices. + // It must be reverted before merging. + maxDocsPerSlice = 1; + maxSegmentsPerSlice = 1; + // end hack + // Make a copy so we can sort: List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: - Collections.sort( - sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); + sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); - final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>(); - long docSum = 0; - List<LeafReaderContext> group = null; + final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new ArrayList<>(); + int currentSliceNumDocs = 0; + List<LeafReaderContextPartition> group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; - groupedLeaves.add(Collections.singletonList(ctx)); + // if the segment does not fit in a single slice, we split it in multiple partitions of + // equal size + int numSlices = Math.ceilDiv(ctx.reader().maxDoc(), maxDocsPerSlice); Review Comment: My mental model of the whole slice/partition/segment/interval concept is: existing physical segments (leaves) divide the index into arbitrary sizes. existing slices (what we have today, not what is called slices in this PR) group segments together. partitions or intervals (in my view) are a logical division of the index into roughly equal-sized contiguous (in docid space) portions and they overlay the segments arbitrarily. Then it is the job of IndexSearcher to map this logical division of work into the underlying physical segments. The main comment here is - let's not confuse ourselves by re-using the word "slice" which already means something else! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org