vigyasharma commented on code in PR #892: URL: https://github.com/apache/lucene/pull/892#discussion_r878666817
########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java: ########## @@ -553,14 +554,20 @@ private void copyChunks( final long toPointer = toDocID == sub.maxDoc ? reader.getMaxPointer() : index.getStartPointer(toDocID); if (fromPointer < toPointer) { - if (numBufferedDocs > 0) { - flush(true); - } final IndexInput rawDocs = reader.getFieldsStream(); rawDocs.seek(fromPointer); do { final int base = rawDocs.readVInt(); final int code = rawDocs.readVInt(); + final boolean dirtyChunk = (code & 2) != 0; + if (copyDirtyChunks) { + if (numBufferedDocs > 0) { + flush(true); + } + } else if (dirtyChunk || numBufferedDocs > 0) { + // Don't copy a dirty chunk or force a flush, which would create a dirty chunk + break; + } Review Comment: Is it that if buffered documents are present, it would create a flush, which would create a dirty chunk? And so, in a `CLEAN_BULK`, we need to break out of the loop and copy docs one at a time? How do you feel about rewriting this if-else as something like: ```java if (sub.mergeStrategy == MergeStrategy.CLEAN_BULK && (dirtyChunk || numBufferedDocs > 0)) { break; } // and then proceed with regular chunk copy code... ... if (numBufferedDocs > 0) { flush(true); } ... ``` Essentially, if we wanted a `CLEAN_BULK`, we break out at the first possibility of dirty chunks - i.e. when we actually see a dirty chunk, or if there are buffered docs present. Did I miss any conditions here? If not, do you think this would make it easier on the eyes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org