jpountz commented on code in PR #12622: URL: https://github.com/apache/lucene/pull/12622#discussion_r1361783707
########## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ########## @@ -5144,20 +5145,71 @@ public int length() { } mergeReaders.add(wrappedReader); } + + MergeState.DocMap[] reorderDocMaps = null; + if (config.getIndexSort() == null) { + // Create a merged view of the input segments. This effectively does the merge. + CodecReader mergedView = SlowCompositeCodecReaderWrapper.wrap(mergeReaders); + Sorter.DocMap docMap = merge.reorder(mergedView, directory); + if (docMap != null) { + reorderDocMaps = new MergeState.DocMap[mergeReaders.size()]; + int docBase = 0; + int i = 0; + for (CodecReader reader : mergeReaders) { + final int finalDocBase = docBase; + reorderDocMaps[i] = + new MergeState.DocMap() { + @Override + public int get(int docID) { + Objects.checkIndex(docID, reader.maxDoc()); + return docMap.oldToNew(finalDocBase + docID); + } + }; + i++; + docBase += reader.maxDoc(); + } + // This makes merging more expensive as it disables some bulk merging optimizations, so + // only do this if a non-null DocMap is returned. + mergeReaders = + Collections.singletonList(SortingCodecReader.wrap(mergedView, docMap, null)); + } + } + final SegmentMerger merger = new SegmentMerger( mergeReaders, merge.info.info, infoStream, dirWrapper, globalFieldNumberMap, context); merge.info.setSoftDelCount(Math.toIntExact(softDeleteCount.get())); merge.checkAborted(); + MergeState mergeState = merger.mergeState; + MergeState.DocMap[] docMaps; + if (reorderDocMaps == null) { + docMaps = mergeState.docMaps; + } else { + assert mergeState.docMaps.length == 1; + MergeState.DocMap compactionDocMap = mergeState.docMaps[0]; + docMaps = new MergeState.DocMap[reorderDocMaps.length]; + for (int i = 0; i < docMaps.length; ++i) { + MergeState.DocMap reorderDocMap = reorderDocMaps[i]; + docMaps[i] = + new MergeState.DocMap() { + @Override + public int get(int docID) { + int reorderedDocId = reorderDocMap.get(docID); Review Comment: We could try to combine the two doc maps. I did it this way to try to keep the reordering logic and the merging logic as independent as possible. On the one hand, we have the reordering logic that works on a merged view of the input segments and doesn't care about deletes. On the other hand, we have the merge logic that computes the mapping between doc IDs in input segments and doc IDs in the merged segment (often just compacting deletes, ie. if index sorting is not enabled). If we wanted to better combine these two things, we'd need to either ignore the `MergeState`'s doc maps or somehow make `MergeState` aware of the reordering. Today `MergeState` is completely unaware of the reordering, from its perspective it just needs to run a singleton merge (single input codec reader) and its only job is to reclaim deletes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org