We just started hitting a FileNotFoundException for no real apparent reason for
both our regular
index and our spellchecker index, and only a few minute after we restarted Solr. I did some
searching and didn't find much that helped.
We started to do some load testing, and after about 10 minutes we started
getting these errors.
We hit the spellchecker every request through a SpellcheckComponent that we created (ie, code ripped
out of SpellCheckRequestHandler for now). It runs essentially the same code as the spellcheck
request handler when we specify a parameter (spellcheck=true).
We have 34 cores. All but two cores are fully optimized (haven't been updated in 2 months). Only
two cores are actively updated. We started Solr around 11:45am, not much happened until 12:27 when
we started load testing (just a few queries, maybe 100 updates).
find /home/dsteiger/local/solr/cores/*/data/index|wc -l => 414
find /home/dsteiger/local/solr/cores/*/data/spell|wc -l => 6 (only the two 'active' cores use the
spell checker). So, not many files are open.
Anyone have any idea what might cause the two below errors to happen? When I restarted Solr around
11:45am it was to test a new patch that set the mergeFactor in the lucene spellchecker to 2 instead
of 300 because we kept running into 'too many files open' errors when rebuilding more than one spell
index at a time. The spell indexes were rebuilt manually using the mergeFactor of 300, solr
restarted, and any subsequent rebuild of the spell index would use a mergeFactor of 2.
After we hit this error, I rebuilt the spell indexes with the new code replicated them to the slave,
restarted Solr, and all has been well. We ran the load testing for more than an hour and the issue
hasn't returned.
Could the old spell indexes that were created using the high mergeFactor cause an issue like this
somehow? Could the opening and closing of searchers so fast cause this? I don't have the slightest
idea. All of our search queries hit the slave, and the master just handles updates. The master had
no issues through all of this.
Caused by: java.io.IOException: cannot read directory
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qaa/data/spell:
list() returned null
at
org.apache.lucene.index.SegmentInfos.getCurrentSegmentGeneration(SegmentInfos.java:115)
at org.apache.lucene.index.IndexReader.indexExists(IndexReader.java:506)
at
org.apache.lucene.search.spell.SpellChecker.setSpellIndex(SpellChecker.java:102)
at
org.apache.lucene.search.spell.SpellChecker.<init>(SpellChecker.java:89)
And this happened I believe when running the snapinstaller (done through
cron)...
Caused by: java.io.FileNotFoundException: no segments* file found in
org.apache.lucene.store.FSDirectory@/home/dsteiger/local/solr/cores/qab/data/index:
files: null
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:587)
at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at
org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:93)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:706)
We're running r614955.
Thanks.
Doug