mikemccand opened a new pull request, #12530:
URL: https://github.com/apache/lucene/pull/12530

   ### Description
   
   Relates #7820.
   
   `CheckIndex` today only detects and exorcises corruption with the latest 
commit point, yet `IndexWriter` will be angry on init if there are older commit 
points and they have "on load" corruption (e.g. missing `_N.si` files).
   
   This is badly inconsistent: if `IndexWriter` or `IndexReader` will hit an 
index corruption, `CheckIndex` should always be able to find it too.  And it is 
conceivable though exceptionally unlikely for an index to legitimately get into 
this broken state on power loss, OS/JVM crash, at just the right/wrong time 
during `IndexWriter.commit`.
   
   This PR is just a first step: it fixes the detection bug in `CheckIndex`, 
adding a new test case carried over and iterated from [this nice test 
case](https://github.com/apache/lucene/issues/7009#issuecomment-1223544484) 
(thank you @buzztaiki!).  So `CheckIndex` will now catch this exotic form of 
corruption.
   
   But it does not yet fix `-exorcise` to be able to correct such a situation.  
That's trickier, especially for `_N.si` files missing since `CheckIndex 
-exorcise` even on the latest commit point cannot correct that error either.  
Progress not perfection!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to