jpountz commented on code in PR #12685: URL: https://github.com/apache/lucene/pull/12685#discussion_r1361188738
########## lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java: ########## @@ -153,6 +157,16 @@ public boolean getUseCompoundFile() { return isCompoundFile; } + /** Returns true if this segment contains documents written as blocks. */ Review Comment: Add a link to `addDocuments` and `updateDocuments`? I wonder if this should be a bit more specific, e.g. "as blocks of 2 docs or more" to clarify that calling `addDocuments` with a single document doesn't count. ########## lucene/core/src/test/org/apache/lucene/index/TestAddIndexes.java: ########## @@ -1815,4 +1815,71 @@ public void testAddIndicesWithSoftDeletes() throws IOException { assertEquals(wrappedReader.numDocs(), writer.getDocStats().maxDoc); IOUtils.close(reader, writer, dir3, dir2, dir1); } + + public void testAddIndicesWithBlocks() throws IOException { + boolean addHasBlocks = random().nextBoolean(); + boolean baseHasBlocks = rarely(); Review Comment: All these cases look worth testing every time intead of randomly picking a single combination? ########## lucene/core/src/java/org/apache/lucene/index/SegmentInfo.java: ########## @@ -153,6 +157,16 @@ public boolean getUseCompoundFile() { return isCompoundFile; } + /** Returns true if this segment contains documents written as blocks. */ Review Comment: Maybe also mention that this started being recorded in 9.9 and that indexes created earlier than that will return `false` regardless? ########## lucene/core/src/java/org/apache/lucene/index/IndexWriter.java: ########## @@ -3368,9 +3368,15 @@ public void addIndexesReaderMerge(MergePolicy.OneMerge merge) throws IOException String mergedName = newSegmentName(); Directory mergeDirectory = mergeScheduler.wrapForMerge(merge, directory); int numSoftDeleted = 0; + boolean hasBlocks = false; for (MergePolicy.MergeReader reader : merge.getMergeReader()) { CodecReader leaf = reader.codecReader; numDocs += leaf.numDocs(); + if (reader.reader == null) { + hasBlocks = true; // NOCOMMIT: can we just assume that it has blocks and go with worst case here? Review Comment: Maybe we could we expose getHasBlocks in LeafMetaData to be able to get this information from a CodecReader? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org