Usually this is caused by one of
1> the file on disk getting corrupted, i.e. the disk going bad.
2> the disk getting full at some point and writing a partial segment

No, you cannot delete the cfs file and re-index only the documents
that were in it because you have no way of knowing exactly what
those documents are. Segments are merged in the background as
part of normal indexing, so figuring out what docs were in the
segment isn’t really possible. (OK, it’s determinate, but there are
so many variables that it might as well be impossible).

CheckIndex -fix will remove the corrupted segments, leaving holes
in your index. You can’t just delete the cfs file yourself because the
segments file which tells Lucene what segments are current references
it. But CheckIndex will take care of both parts for you.

If you really can’t re-index everything, you could certainly use a
streaming expression to get a list of all the IDs in the index, compare
that against your DB and only index the difference, but whether that’s
more work than just reindexing anyway I don’t know.

You don’t say whether you’re using SolrCloud or not, but if you are _and_
if you have more than one replica, just DELETEREPLICA on the bad one and
use ADDREPLICA to put it back. It’ll sync with the leader automatically.

Best,
Erick

> On May 19, 2020, at 9:33 AM, nettadalet <nsteinb...@dalet.com> wrote:
> 
> I get the following exception:
> Caused by: org.apache.lucene.index.CorruptIndexException: length should be
> 104004663 bytes, but is 104856631 instead
> (resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs"))
> 
> What may be the cause of this?
> How can the length of the .cfs file change so it become corrupted?
> Can I simply delete this .cfs file and then synchronized the index against
> the database, so only the missing files will be indexed, instead of
> reindexing all the files?
> 
> Thanks in advance.
> 
> 
> 
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to