Re: [PR] WIP: draft of intra segment concurrency [lucene]

via GitHub Mon, 08 Jul 2024 01:41:11 -0700


javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1668225258



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext> 
leaves) {
   /** Static method to segregate LeafReaderContexts amongst multiple slices */
   public static LeafSlice[] slices(
       List<LeafReaderContext> leaves, int maxDocsPerSlice, int 
maxSegmentsPerSlice) {
+
+    // TODO this is a temporary hack to force testing against multiple leaf 
reader context slices.
+    // It must be reverted before merging.
+    maxDocsPerSlice = 1;
+    maxSegmentsPerSlice = 1;
+    // end hack
+
     // Make a copy so we can sort:
     List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves);
 
     // Sort by maxDoc, descending:
-    Collections.sort(
-        sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc())));
+    sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l -> 
l.reader().maxDoc())));
 
-    final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>();
-    long docSum = 0;
-    List<LeafReaderContext> group = null;
+    final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new 
ArrayList<>();
+    int currentSliceNumDocs = 0;
+    List<LeafReaderContextPartition> group = null;
     for (LeafReaderContext ctx : sortedLeaves) {
       if (ctx.reader().maxDoc() > maxDocsPerSlice) {
         assert group == null;
-        groupedLeaves.add(Collections.singletonList(ctx));
+        // if the segment does not fit in a single slice, we split it in 
multiple partitions of

Review Comment:
   Thanks for your feedback @mikemccand ! You are right to be confused, this is 
a POC and details about abstractions needs refining. 
   
   Here is how I got to this place: we have slices that target multiple leaf 
reader contexts, and I wanted to be able to have a slice still targeting 
multiple segments, at the same time targeting a partition of a segment. The 
easier way to do that was to wrap a leaf reader context into a new abstraction 
that holds a leaf reader context and defines a range of doc ids as well. A 
partition can still point to the entire segment though.
   
   Technically, a slice is a set of segments, yet a partition of an index, 
while a leaf reader context partition is a partition of a segment, hence the 
confusion :)  One natural next step could be to fold the range of doc ids into 
the leaf reader context perhaps, but that is a substantial API change. 
Otherwise keeping the structure i came up with, but renaming things to clarify 
the concepts and reduce confusion. Do you have other options in mind?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] WIP: draft of intra segment concurrency [lucene]

Reply via email to