Re: [PR] Add support for intra-segment search concurrency [lucene]

via GitHub Wed, 31 Jul 2024 06:12:38 -0700


stefanvodita commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1698485404



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext> 
leaves) {
   /** Static method to segregate LeafReaderContexts amongst multiple slices */
   public static LeafSlice[] slices(
       List<LeafReaderContext> leaves, int maxDocsPerSlice, int 
maxSegmentsPerSlice) {
+
+    // TODO this is a temporary hack to force testing against multiple leaf 
reader context slices.
+    // It must be reverted before merging.
+    maxDocsPerSlice = 1;
+    maxSegmentsPerSlice = 1;
+    // end hack
+
     // Make a copy so we can sort:
     List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves);
 
     // Sort by maxDoc, descending:
-    Collections.sort(
-        sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc())));
+    sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc())));
 
-    final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>();
-    long docSum = 0;
-    List<LeafReaderContext> group = null;
+    final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new 
ArrayList<>();
+    int currentSliceNumDocs = 0;
+    List<LeafReaderContextPartition> group = null;
     for (LeafReaderContext ctx : sortedLeaves) {
       if (ctx.reader().maxDoc() > maxDocsPerSlice) {
         assert group == null;
-        groupedLeaves.add(Collections.singletonList(ctx));
+        // if the segment does not fit in a single slice, we split it in 
multiple partitions of

Review Comment:
   Sorry I haven't been able to keep up with all the updates on the PR. I still 
think there's a better partitioning [name up for debate] strategy, that will 
make searching more efficient.
   
   @msokolov [put 
it](https://github.com/apache/lucene/pull/13542/files#r1668914408) really well:
   > partitions or intervals (in my view) are a logical division of the index 
into roughly equal-sized contiguous (in docid space) portions and they overlay 
the segments arbitrarily
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for intra-segment search concurrency [lucene]

Reply via email to