mikemccand commented on issue #7820: URL: https://github.com/apache/lucene/issues/7820#issuecomment-1684924512
Hi @SevenCss, indeed I think there is a bug in `CheckIndex` here, because `IndexWriter` (correctly) cannot open the index yet `CheckIndex` can't find any corruption. *First off, please take a full backup of your index before trying the steps below!* It sounds like you have a working commit point with `segments_a8` and a broken one with `segments_a7` so to recover your index, after making a full backup of your index!!, and with no open `IndexReader`/`IndexWriter` on the index, manually delete `segments_a7`. `IndexWriter` should be able to open the index and then delete the now unreferenced files correctly. Then close `IndexWriter` and confirm it can again open the index and continue indexing documents. If so, your index should be recovered. Second off, I'm curious how your index got into this state -- did you suffer an OS or JVM crash, or power loss, or so in your indexing process? Is your index on a mounted drive and that remote file server crashed or so? Or a network hiccup disconnected and reconnected the mounted drive? Do you have any interesting index replication to copy the new segments of an index between machines or so? Windows is tricky for Lucene because still-open files cannot be deleted nor unlinked ... it causes "fun" issues sometimes. Third off, there is possibly a separate improvement we could make to `IndexWriter`, to remove `segments_N` files before removing all other files when a commit point is deleted, to try to reduce the chance of an index getting into this state. That has a nice symmetry with how we write a commit (write various files first, and only when that succeeds do we write and fsync the `segments_N` referencing them). I'll open a follow-on issue for that. Let's focus for this issue on fixing this bug in `CheckIndex`. In #7009, @buzztalk made a [nice test case that we can start from](https://github.com/apache/lucene/issues/7009#issuecomment-1223544484). The fix seems "simple" -- if there is a working `segments_N`, `CheckIndex` should additionally detect when other commit points (`segments_N`) fail to open remove any additional broken commit points (`segments_N`) if there is a working `segments_N`. But maybe there was some wrinkle that prevented us from doing this in the past ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org