Thanks Lance and Michael,
We are running Solr 1.3.0.2009.09.03.11.14.39 (Complete version info from Solr admin panel appended below) I tried running CheckIndex (with the -ea: switch ) on one of the shards. CheckIndex also produced an ArrayIndexOutOfBoundsException on the larger segment containing 500K+ documents. (Complete CheckIndex output appended below) Is it likely that all 10 shards are corrupted? Is it possible that we have simply exceeded some lucene limit? I'm wondering if we could have exceeded the lucene limit of unique terms of 2.1 billion as mentioned towards the end of the Lucene Index File Formats document. If the small 731 document index has nine million unique terms as reported by check index, then even though many terms are repeated, it is concievable that the 500,000 document index could have more than 2.1 billion terms. Do you know if the number of terms reported by CheckIndex is the number of unique terms? On the other hand, we previously optimized a 1 million document index down to 1 segment and had no problems. That was with an earlier version of Solr and did not include CommonGrams which could conceivably increase the number of terms in the index by 2 or 3 times. Tom ----------------------------------------------------------------------------------- Solr Specification Version: 1.3.0.2009.09.03.11.14.39 Solr Implementation Version: 1.4-dev 793569 - root - 2009-09-03 11:14:39 Lucene Specification Version: 2.9-dev Lucene Implementation Version: 2.9-dev 779312 - 2009-05-27 17:19:55 [tburt...@slurm-4 ~]$ java -Xmx4096m -Xms4096m -cp /l/local/apache-tomcat-serve/webapps/solr-sdr-search/serve-10/WEB-INF/lib/lucene-core-2.9-dev.jar:/l/local/apache-tomcat-serve/webapps/solr-sdr-search/serve-10/WEB-INF/lib -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /l/solrs/1/.snapshot/serve-2010-02-07/data/index Opening index @ /l/solrs/1/.snapshot/serve-2010-02-07/data/index Segments file=segments_zo numSegments=2 version=FORMAT_DIAGNOSTICS [Lucene 2.9] 1 of 2: name=_29dn docCount=554799 compound=false hasProx=true numFiles=9 size (MB)=267,131.261 diagnostics = {optimize=true, mergeFactor=2, os.version=2.6.18-164.6.1.el5, os=Linux, mergeDocStores=true, lucene.version=2.9-dev 779312 - 2009-05-27 17:19:55, source=merge, os.arch=amd64, java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.} has deletions [delFileName=_29dn_7.del] test: open reader.........OK [184 deleted docs] test: fields, norms.......OK [6 fields] test: terms, freq, prox...FAILED WARNING: fixIndex() would remove reference to this segment; full exception: java.lang.ArrayIndexOutOfBoundsException: -16777214 at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:246) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:218) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:474) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:715) 2 of 2: name=_29im docCount=731 compound=false hasProx=true numFiles=8 size (MB)=421.261 diagnostics = {optimize=true, mergeFactor=3, os.version=2.6.18-164.6.1.el5, os=Linux, mergeDocStores=true, lucene.version=2.9-dev 779312 - 2009-05-27 17:19:55, source=merge, os.arch=amd64, java.version=1.6.0_16, java.vendor=Sun Microsystems Inc.} no deletions test: open reader.........OK test: fields, norms.......OK [6 fields] test: terms, freq, prox...OK [9504552 terms; 34864047 terms/docs pairs; 144869629 tokens] test: stored fields.......OK [3550 total field count; avg 4.856 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc] WARNING: 1 broken segments (containing 554615 documents) detected WARNING: would write new segments file, and 554615 documents would be lost, if -fix were specified [tburt...@slurm-4 ~]$ The index is corrupted. In some places ArrayIndex and NPE are not wrapped as CorruptIndexException. Try running your code with the Lucene assertions on. Add this to the JVM arguments: -ea:org.apache.lucene... -- View this message in context: http://old.nabble.com/TermInfosReader.get-ArrayIndexOutOfBoundsException-tp27506243p27518800.html Sent from the Solr - User mailing list archive at Nabble.com.