The user that runs our apps is configured to allow 65536 open files in limits.conf. Shouldn't even come close to that number. Solr is the only app we have running on these machines as our app user.

We hit the same type of issue when we had our mergeFactor set to 40 for all of our indexes. We lowered it to 5 and have been fine since.

No errors in the snappuller for either core. The spellcheck index is rebuilt once a night around midnight and copied to the slave afterwards. I had even rebuilt the spell index manually for the two cores, pulled them, installed them, and tested to make sure it was working with a few queries before the load testing started (this was before we released the patch to lower the spell index mergeFactor).

We were even getting errors trying to run out postCommit script on the slave (it doesn't end up doing anything since it's the slave).

SEVERE: java.io.IOException: Cannot run program "./solr/bin/snapctl": java.io.IOException: error=24, Too many open files
        at java.lang.ProcessBuilder.start(Unknown Source)
        at java.lang.Runtime.exec(Unknown Source)

And a correction from my previous email. The errors started 10 -seconds- after load testing started. This was about 40 minutes after Solr started, and less than 30 queries had been run on the server before load testing started.

Load testing has been fine since I restarted Solr and rebuilt the spellcheck indexes with the lowered mergeFactor.

Doug

Otis Gospodnetic wrote:
Hi Doug,

Sounds fishy, especially increasing/decreasing mergeFactor to "funny values" 
(try changing your OS setting instead).

My guess is this is happening only with the 2 indices that are being modified 
and I'll guess that the FNFE is due to a bad/incomplete rsync from the master.  
Do snappuller logs mention any errors?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Reply via email to