mikemccand commented on issue #7820:
URL: https://github.com/apache/lucene/issues/7820#issuecomment-1684924512

   Hi @SevenCss, indeed I think there is a bug in `CheckIndex` here, because 
`IndexWriter` (correctly) cannot open the index yet `CheckIndex` can't find any 
corruption.
   
   *First off, please take a full backup of your index before trying the steps 
below!*
   
   It sounds like you have a working commit point with `segments_a8` and a 
broken one with `segments_a7` so to recover your index, after making a full 
backup of your index!!, and with no open `IndexReader`/`IndexWriter` on the 
index, manually delete `segments_a7`.  `IndexWriter` should be able to open the 
index and then delete the now unreferenced files correctly.  Then close 
`IndexWriter` and confirm it can again open the index and continue indexing 
documents.  If so, your index should be recovered.
   
   Second off, I'm curious how your index got into this state -- did you suffer 
an OS or JVM crash, or power loss, or so in your indexing process?  Is your 
index on a mounted drive and that remote file server crashed or so?  Or a 
network hiccup disconnected and reconnected the mounted drive?  Do you have any 
interesting index replication to copy the new segments of an index between 
machines or so?  Windows is tricky for Lucene because still-open files cannot 
be deleted nor unlinked ... it causes "fun" issues sometimes.
   
   Third off, there is possibly a separate improvement we could make to 
`IndexWriter`, to remove `segments_N` files before removing all other files 
when a commit point is deleted, to try to reduce the chance of an index getting 
into this state.  That has a nice symmetry with how we write a commit (write 
various files first, and only when that succeeds do we write and fsync the 
`segments_N` referencing them).  I'll open a follow-on issue for that.  Let's 
focus for this issue on fixing this bug in `CheckIndex`.
   
   In #7009, @buzztalk made a [nice test case that we can start 
from](https://github.com/apache/lucene/issues/7009#issuecomment-1223544484).  
The fix seems "simple" -- if there is a working `segments_N`, `CheckIndex` 
should additionally detect when other commit points (`segments_N`) fail to open 
remove any additional broken commit points (`segments_N`) if there is a working 
`segments_N`.  But maybe there was some wrinkle that prevented us from doing 
this in the past ...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to