gokaai commented on code in PR #12530:
URL: https://github.com/apache/lucene/pull/12530#discussion_r1322006478


##########
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##########
@@ -610,6 +610,39 @@ public Status checkIndex(List<String> onlySegments, 
ExecutorService executorServ
       return result;
     }
 
+    // https://github.com/apache/lucene/issues/7820: also attempt to open any 
older commit
+    // points (segments_N), which will catch certain corruption like missing 
_N.si files
+    // for segments not also referenced by the newest commit point (which was 
already
+    // loaded, successfully, above).  Note that we do not do a deeper check of 
segments
+    // referenced ONLY by these older commit points, because such corruption 
would not
+    // prevent a new IndexWriter from opening on the newest commit point.  but 
it is still
+    // corruption, e.g. a reader opened on those old commit points can hit 
corruption
+    // exceptions which we (still) will not detect here.  progress not 
perfection!
+
+    for (String fileName : files) {
+      if (fileName.startsWith(IndexFileNames.SEGMENTS)
+          && fileName.equals(lastSegmentsFile) == false) {

Review Comment:
   We can avoid having a [separate block specifically for validating the 
`lastSegmentsFile`](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L597-L611)
 by using this conditional to determine 
   1. [The error 
message](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L634-L638)
 (i.e., whether it's the latest segment or an older one which could not be 
read) and
   2. [Whether or not the commit read should be set as the value for `sis`, 
which will be checked 
deeper](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L600)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to