While trying to upgrade 100G index from Solr 4 to 5, check index (actually updater) indicates that the index is corrupted. Hence, I ran check index to fix the index which showed broken segment warning and then deleted those documents. I then ran index update on the fixed index which upgraded fine without any error (need to setup Solr/ZK to test though).
WARNING: 2 broken segments (containing 50000 documents) detected Is there an easy way to figure out which documents (by ID) got deleted, or I need to compare document IDs in old and new index? Also, what does broken segments mean with respect to querying documents? Are those documents still searchable in corrupted index as long as the segments are not deleted? Note a few small test indexes had no issues with corruption or upgrade. Large index problem could be related to memory or network issues. Thanks in advance.