Usually this is caused by one of 1> the file on disk getting corrupted, i.e. the disk going bad. 2> the disk getting full at some point and writing a partial segment
No, you cannot delete the cfs file and re-index only the documents that were in it because you have no way of knowing exactly what those documents are. Segments are merged in the background as part of normal indexing, so figuring out what docs were in the segment isn’t really possible. (OK, it’s determinate, but there are so many variables that it might as well be impossible). CheckIndex -fix will remove the corrupted segments, leaving holes in your index. You can’t just delete the cfs file yourself because the segments file which tells Lucene what segments are current references it. But CheckIndex will take care of both parts for you. If you really can’t re-index everything, you could certainly use a streaming expression to get a list of all the IDs in the index, compare that against your DB and only index the difference, but whether that’s more work than just reindexing anyway I don’t know. You don’t say whether you’re using SolrCloud or not, but if you are _and_ if you have more than one replica, just DELETEREPLICA on the bad one and use ADDREPLICA to put it back. It’ll sync with the leader automatically. Best, Erick > On May 19, 2020, at 9:33 AM, nettadalet <nsteinb...@dalet.com> wrote: > > I get the following exception: > Caused by: org.apache.lucene.index.CorruptIndexException: length should be > 104004663 bytes, but is 104856631 instead > (resource=MMapIndexInput(path="path_to_index\index\_jlp.cfs")) > > What may be the cause of this? > How can the length of the .cfs file change so it become corrupted? > Can I simply delete this .cfs file and then synchronized the index against > the database, so only the missing files will be indexed, instead of > reindexing all the files? > > Thanks in advance. > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html