gokaai commented on code in PR #12530: URL: https://github.com/apache/lucene/pull/12530#discussion_r1322006478
########## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ########## @@ -610,6 +610,39 @@ public Status checkIndex(List<String> onlySegments, ExecutorService executorServ return result; } + // https://github.com/apache/lucene/issues/7820: also attempt to open any older commit + // points (segments_N), which will catch certain corruption like missing _N.si files + // for segments not also referenced by the newest commit point (which was already + // loaded, successfully, above). Note that we do not do a deeper check of segments + // referenced ONLY by these older commit points, because such corruption would not + // prevent a new IndexWriter from opening on the newest commit point. but it is still + // corruption, e.g. a reader opened on those old commit points can hit corruption + // exceptions which we (still) will not detect here. progress not perfection! + + for (String fileName : files) { + if (fileName.startsWith(IndexFileNames.SEGMENTS) + && fileName.equals(lastSegmentsFile) == false) { Review Comment: We can avoid having a [separate block specifically for validating the `lastSegmentsFile`](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L597-L611) by using this conditional to determine 1. [The error message](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L634-L638) (i.e., whether it's the latest segment or an older one which could not be read) and 2. [Whether or not the commit read should be set as the value for `sis`, which will be checked deeper](https://github.com/apache/lucene/blob/f96b84321806a3f6aac79dd17fd656636250fb78/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java#L600) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org