We just started hitting a FileNotFoundException for no real apparent reason for 
both our regular
index and our spellchecker index, and only a few minute after we restarted Solr. I did some searching and didn't find much that helped.

We started to do some load testing, and after about 10 minutes we started 
getting these errors.

We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped out of SpellCheckRequestHandler for now). It runs essentially the same code as the spellcheck request handler when we specify a parameter (spellcheck=true).

We have 34 cores. All but two cores are fully optimized (haven't been updated in 2 months). Only two cores are actively updated. We started Solr around 11:45am, not much happened until 12:27 when we started load testing (just a few queries, maybe 100 updates).

find /home/dsteiger/local/solr/cores/*/data/index|wc -l  => 414
find /home/dsteiger/local/solr/cores/*/data/spell|wc -l => 6 (only the two 'active' cores use the spell checker). So, not many files are open.

Anyone have any idea what might cause the two below errors to happen? When I restarted Solr around 11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell index at a time. The spell indexes were rebuilt manually using the mergeFactor of 300, solr restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2.

After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave, restarted Solr, and all has been well. We ran the load testing for more than an hour and the issue hasn't returned.

Could the old spell indexes that were created using the high mergeFactor cause an issue like this somehow? Could the opening and closing of searchers so fast cause this? I don't have the slightest idea. All of our search queries hit the slave, and the master just handles updates. The master had no issues through all of this.

Caused by: java.io.IOException: cannot read directory
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell:
 list() returned null
        at 
org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115)
        at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506)
        at 
org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102)
        at 
org.apache.lucene.search.spell.SpellChecker.<init>(SpellChecker.java:89)


And this happened I believe when running the snapinstaller (done through 
cron)...

Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index:
 files: null
        at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587)
        at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at 
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:93)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706)

We're running r614955.

Thanks.
Doug

Reply via email to