vigyasharma commented on code in PR #892:
URL: https://github.com/apache/lucene/pull/892#discussion_r878666817


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsWriter.java:
##########
@@ -553,14 +554,20 @@ private void copyChunks(
     final long toPointer =
         toDocID == sub.maxDoc ? reader.getMaxPointer() : 
index.getStartPointer(toDocID);
     if (fromPointer < toPointer) {
-      if (numBufferedDocs > 0) {
-        flush(true);
-      }
       final IndexInput rawDocs = reader.getFieldsStream();
       rawDocs.seek(fromPointer);
       do {
         final int base = rawDocs.readVInt();
         final int code = rawDocs.readVInt();
+        final boolean dirtyChunk = (code & 2) != 0;
+        if (copyDirtyChunks) {
+          if (numBufferedDocs > 0) {
+            flush(true);
+          }
+        } else if (dirtyChunk || numBufferedDocs > 0) {
+          // Don't copy a dirty chunk or force a flush, which would create a 
dirty chunk
+          break;
+        }

Review Comment:
   Is it that if buffered documents are present, it would create a flush, which 
would create a dirty chunk? And so, in a `CLEAN_BULK`, we need to break out of 
the loop and copy docs one at a time?
   
   How do you feel about rewriting this if-else as something like:
   ```java
   if (sub.mergeStrategy == MergeStrategy.CLEAN_BULK 
       && (dirtyChunk || numBufferedDocs > 0)) {
     break;
   }
   // and then proceed with regular chunk copy code...
   ...
   if (numBufferedDocs > 0) {
     flush(true);
   }
   ...
   ```
   Essentially, if we wanted a `CLEAN_BULK`, we break out at the first 
possibility of dirty chunks - i.e. when we actually see a dirty chunk, or if 
there are buffered docs present.
   
   Did I miss any conditions here? If not, do you think this would make it 
easier on the eyes?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to