stefanvodita commented on code in PR #13542: URL: https://github.com/apache/lucene/pull/13542#discussion_r1665869563
########## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ########## @@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext> leaves) { /** Static method to segregate LeafReaderContexts amongst multiple slices */ public static LeafSlice[] slices( List<LeafReaderContext> leaves, int maxDocsPerSlice, int maxSegmentsPerSlice) { + + // TODO this is a temporary hack to force testing against multiple leaf reader context slices. + // It must be reverted before merging. + maxDocsPerSlice = 1; + maxSegmentsPerSlice = 1; + // end hack + // Make a copy so we can sort: List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves); // Sort by maxDoc, descending: - Collections.sort( - sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); + sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> l.reader().maxDoc()))); - final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>(); - long docSum = 0; - List<LeafReaderContext> group = null; + final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new ArrayList<>(); + int currentSliceNumDocs = 0; + List<LeafReaderContextPartition> group = null; for (LeafReaderContext ctx : sortedLeaves) { if (ctx.reader().maxDoc() > maxDocsPerSlice) { assert group == null; - groupedLeaves.add(Collections.singletonList(ctx)); + // if the segment does not fit in a single slice, we split it in multiple partitions of Review Comment: Thank you for moving forward on this issue @javanna! I had a different strategy in mind for slicing the index. With the current implementation, we deduce the number of slices based on a given per-slice doc count. What if the number of slices was given instead? Each segment would not be divided into n slices, but a slice could straddle over a few segments. I think it gives us better control over the concurrency and leaves us with fewer slices per segment while achieving the same level of concurrency, which is better because it reduces the fixed cost of search that we pay per slice per segment. Are there challenges that make that hard to implement? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org