mikemccand commented on code in PR #12872:
URL: https://github.com/apache/lucene/pull/12872#discussion_r1426907008


##########
lucene/core/src/java/org/apache/lucene/index/SegmentInfos.java:
##########
@@ -389,13 +389,39 @@ private static void parseSegmentInfos(
     }
 
     long totalDocs = 0;
+    SegmentInfo info;
     for (int seg = 0; seg < numSegments; seg++) {
       String segName = input.readString();
       byte[] segmentID = new byte[StringHelper.ID_LENGTH];
       input.readBytes(segmentID, 0, segmentID.length);
       Codec codec = readCodec(input);
-      SegmentInfo info =
-          codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+      try {
+        info = codec.segmentInfoFormat().read(directory, segName, segmentID, 
IOContext.READ);
+      } catch (FileNotFoundException | NoSuchFileException e) {
+        // https://github.com/apache/lucene/issues/7820 - If we throw a 
CorruptIndex exception here,
+        // a missing segment info will cause parseSegmentInfos() to 
short-circuit and stop reading
+        // the rest of the segments. So, when encountering a segment with a 
missing
+        // segment info (.si) file, we initialize an empty segment info 
(containing 0 docs)
+        // to keep track of the segment name, so that we can continue reading 
other segments, and
+        // the segment with the missing .si file can be later validated or 
exorcised by CheckIndex()
+        info =

Review Comment:
   > * Throwing an exception in `CheckIndex` short-circuits the flow that makes 
it possible for `-exorcise` to do it's thing
   
   Hmm this should still work.  The exception should be thrown at a low level, 
when the codec is trying to read the `.si` file.  This is then caught by 
`CheckIndex`, and it knows which segment is affected by checking `instanceof` 
of the exception and then extracting the segment name.  Then it can report the 
root cause and/or exorcise just fine?
   
   > * Since we don't do a deeper level check on segments which aren't the 
latest segment, the recently added unit test where we expect prior commits to 
throw an exception if corrupt, does not pass
   
   Hmm why is that?  Codec should throw an exception when we try to load that 
(non-latest) SIS too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to