The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even
come close to that number. Solr is the only app we have running on these machines as our app user.
We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We
lowered it to 5 and have been fine since.
No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around
midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the
two cores, pulled them, installed them, and tested to make sure it was working with a few queries
before the load testing started (this was before we released the patch to lower the spell index
mergeFactor).
We were even getting errors trying to run out postCommit script on the slave (it doesn't end up
doing anything since it's the slave).
SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24,
Too many open files
at java.lang.ProcessBuilder.start(Unknown Source)
at java.lang.Runtime.exec(Unknown Source)
And a correction from my previous email. The errors started 10 -seconds- after load testing
started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the
server before load testing started.
Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the
lowered mergeFactor.
Doug
Otis Gospodnetic wrote:
Hi Doug,
Sounds fishy, especially increasing/decreasing mergeFactor to "funny values"
(try changing your OS setting instead).
My guess is this is happening only with the 2 indices that are being modified
and I'll guess that the FNFE is due to a bad/incomplete rsync from the master.
Do snappuller logs mention any errors?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch