Mads, some distributions require different steps for increasing max_open_files. Check how it works vor CentOS specifically.
Markus -----Original message----- > From:Mads Tomasgård Bjørgan <m...@dips.no> > Sent: Thursday 30th June 2016 10:52 > To: solr-user@lucene.apache.org > Subject: Solr node crashes while indexing - Too many open files > > Hello, > We're indexing a large set of files using Solr 6.1.0, running a SolrCloud by > utilizing ZooKeeper 3.4.8. > > We have two ensembles - and both clusters are running on three of their own > respective VMs (CentOS 7). We first thought the error was due to CDCR - as we > were trying to index a large amount of documents which had to be replicated > to the target cluster. However, we got the same error even after turning of > CDCR - which indicates CDCR wasn't the problem after all. > > After indexing between 20 000 to 35 000 documents to the source cluster does > the File Descriptor Count reach 4096 for one of the solr-nodes - and the > respective node crashes. The count grows quite linearly as time goes. The > remaining 2 nodes in the cluster is not affected at all, and their logs had > no relevant posts. We found the following errors for the crashing node in > its log: > > 2016-06-30 08:23:12.459 ERROR > (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1 > x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) > [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] > o.a.s.u.StreamingSolrClients error > java.net.SocketException: Too many open files > (...) > 2016-06-30 08:23:12.460 ERROR > (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1 > x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) > [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] > o.a.s.u.StreamingSolrClients error > java.net.SocketException: Too many open files > (...) > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.h.RequestHandlerBase > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > 2 Async exceptions during distributed update: > Too many open files > Too many open files > (...) > 2016-06-30 08:23:12.461 INFO (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1] > webapp=/solr path=/update params={version=2.2} status=-1 QTime=5 > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall > null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > 2 Async exceptions during distributed update: > Too many open files > Too many open files > (....) > > 2016-06-30 08:23:12.461 WARN (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall invalid return code: -1 > 2016-06-30 08:23:38.108 INFO (qtp314337396-20) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1] > webapp=/solr path=/select > params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=https://10.0.106.115:443/solr/DIPS_shard1_replica1/&rows=10&version=2&q=*:*&NOW=1467275018057&isShard=true&wt=javabin&_=1467275017220} > hits=30218 status=0 QTime=1 > > Running netstat -n -p on the VM that yields the exceptions reveals that there > is at least 1 800 TCP connections (not counted how many - the netstat command > filled the entire PuTTY window yielding 2 000 lines) waiting to be closed: > tcp6 70 0 10.0.106.115:34531 10.0.106.114:443 > CLOSE_WAIT 21658/java > We're running the SolrCloud on 443, and the IP's belong to the VMs. We also > tried adjusting the ulimit for the machine to 100 000 - without any results.. > > Greetings, > Mads >