Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

via GitHub Tue, 17 Oct 2023 02:10:48 -0700


jpountz commented on code in PR #12622:
URL: https://github.com/apache/lucene/pull/12622#discussion_r1361783707



##########
lucene/core/src/java/org/apache/lucene/index/IndexWriter.java:
##########
@@ -5144,20 +5145,71 @@ public int length() {
         }
         mergeReaders.add(wrappedReader);
       }
+
+      MergeState.DocMap[] reorderDocMaps = null;
+      if (config.getIndexSort() == null) {
+        // Create a merged view of the input segments. This effectively does 
the merge.
+        CodecReader mergedView = 
SlowCompositeCodecReaderWrapper.wrap(mergeReaders);
+        Sorter.DocMap docMap = merge.reorder(mergedView, directory);
+        if (docMap != null) {
+          reorderDocMaps = new MergeState.DocMap[mergeReaders.size()];
+          int docBase = 0;
+          int i = 0;
+          for (CodecReader reader : mergeReaders) {
+            final int finalDocBase = docBase;
+            reorderDocMaps[i] =
+                new MergeState.DocMap() {
+                  @Override
+                  public int get(int docID) {
+                    Objects.checkIndex(docID, reader.maxDoc());
+                    return docMap.oldToNew(finalDocBase + docID);
+                  }
+                };
+            i++;
+            docBase += reader.maxDoc();
+          }
+          // This makes merging more expensive as it disables some bulk 
merging optimizations, so
+          // only do this if a non-null DocMap is returned.
+          mergeReaders =
+              Collections.singletonList(SortingCodecReader.wrap(mergedView, 
docMap, null));
+        }
+      }
+
       final SegmentMerger merger =
           new SegmentMerger(
               mergeReaders, merge.info.info, infoStream, dirWrapper, 
globalFieldNumberMap, context);
       merge.info.setSoftDelCount(Math.toIntExact(softDeleteCount.get()));
       merge.checkAborted();
 
+      MergeState mergeState = merger.mergeState;
+      MergeState.DocMap[] docMaps;
+      if (reorderDocMaps == null) {
+        docMaps = mergeState.docMaps;
+      } else {
+        assert mergeState.docMaps.length == 1;
+        MergeState.DocMap compactionDocMap = mergeState.docMaps[0];
+        docMaps = new MergeState.DocMap[reorderDocMaps.length];
+        for (int i = 0; i < docMaps.length; ++i) {
+          MergeState.DocMap reorderDocMap = reorderDocMaps[i];
+          docMaps[i] =
+              new MergeState.DocMap() {
+                @Override
+                public int get(int docID) {
+                  int reorderedDocId = reorderDocMap.get(docID);

Review Comment:
   We could try to combine the two doc maps.
   
   I did it this way to try to keep the reordering logic and the merging logic 
as independent as possible. On the one hand, we have the reordering logic that 
works on a merged view of the input segments and doesn't care about deletes. On 
the other hand, we have the merge logic that computes the mapping between doc 
IDs in input segments and doc IDs in the merged segment (often just compacting 
deletes, ie. if index sorting is not enabled).
   
   If we wanted to better combine these two things, we'd need to either ignore 
the `MergeState`'s doc maps or somehow make `MergeState` aware of the 
reordering. Today `MergeState` is completely unaware of the reordering, from 
its perspective it just needs to run a singleton merge (single input codec 
reader) and its only job is to reclaim deletes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add a merge policy wrapper that performs recursive graph bisection on merge. [lucene]

Reply via email to