That's true, but I was hoping there would be another way to solve this issue as it's not considered preferable in our situation.
Is it normal behavior for Solr to open over 4000 files without closing them properly? Is it for example possible to adjust autoCommit-settings I solrconfig.xml for forcing Solr to close the files? Any help is appreciated :-) -----Original Message----- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: torsdag 30. juni 2016 11.41 To: solr-user@lucene.apache.org Subject: RE: Solr node crashes while indexing - Too many open files Mads, some distributions require different steps for increasing max_open_files. Check how it works vor CentOS specifically. Markus -----Original message----- > From:Mads Tomasgård Bjørgan <m...@dips.no> > Sent: Thursday 30th June 2016 10:52 > To: solr-user@lucene.apache.org > Subject: Solr node crashes while indexing - Too many open files > > Hello, > We're indexing a large set of files using Solr 6.1.0, running a SolrCloud by > utilizing ZooKeeper 3.4.8. > > We have two ensembles - and both clusters are running on three of their own > respective VMs (CentOS 7). We first thought the error was due to CDCR - as we > were trying to index a large amount of documents which had to be replicated > to the target cluster. However, we got the same error even after turning of > CDCR - which indicates CDCR wasn't the problem after all. > > After indexing between 20 000 to 35 000 documents to the source cluster does > the File Descriptor Count reach 4096 for one of the solr-nodes - and the > respective node crashes. The count grows quite linearly as time goes. The > remaining 2 nodes in the cluster is not affected at all, and their logs had > no relevant posts. We found the following errors for the crashing node in > its log: > > 2016-06-30 08:23:12.459 ERROR > (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1 > x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) > [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] > o.a.s.u.StreamingSolrClients error > java.net.SocketException: Too many open files > (...) > 2016-06-30 08:23:12.460 ERROR > (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1 > x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) > [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] > o.a.s.u.StreamingSolrClients error > java.net.SocketException: Too many open files > (...) > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.h.RequestHandlerBase > org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > 2 Async exceptions during distributed update: > Too many open files > Too many open files > (...) > 2016-06-30 08:23:12.461 INFO (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1] > webapp=/solr path=/update params={version=2.2} status=-1 QTime=5 > 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall > null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException: > 2 Async exceptions during distributed update: > Too many open files > Too many open files > (....) > > 2016-06-30 08:23:12.461 WARN (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall invalid return code: -1 > 2016-06-30 08:23:38.108 INFO (qtp314337396-20) [c:DIPS s:shard1 r:core_node1 > x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1] > webapp=/solr path=/select > params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=https://10.0.106.115:443/solr/DIPS_shard1_replica1/&rows=10&version=2&q=*:*&NOW=1467275018057&isShard=true&wt=javabin&_=1467275017220} > hits=30218 status=0 QTime=1 > > Running netstat -n -p on the VM that yields the exceptions reveals that there > is at least 1 800 TCP connections (not counted how many - the netstat command > filled the entire PuTTY window yielding 2 000 lines) waiting to be closed: > tcp6 70 0 10.0.106.115:34531 10.0.106.114:443 > CLOSE_WAIT 21658/java > We're running the SolrCloud on 443, and the IP's belong to the VMs. We also > tried adjusting the ulimit for the machine to 100 000 - without any results.. > > Greetings, > Mads >