Hi,
We've just been working with a client who had a corruption issue with
their SolrCloud install. They're running Solr 5.3.1, with a collection
spread across 12 shards. Each shard has a single replica.
They were seeing "Index Corruption" errors when running certain queries.
We investigated, and narrowed it down to a single shard. Using the
Lucene CheckIndex utility, we tested both the primary and replica copies
of the data, and found the same issue with both - the first segment,
containing the majority of the documents, was reporting corruption. They
were able to restore from a backup, but it would be good to get some
idea what could have caused the problem in SolrCloud. One of the
machines ran out of disk space last week during indexing, which we guess
could have been the starting point for the corrupted data files.
Our question is: why would the corruption have spread to the replica as
well? Could a corrupted document be replicated and cause the replica
index to break as well?
Thanks,
Matt
--
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk