Mads, some distributions require different steps for increasing max_open_files. 
Check how it works vor CentOS specifically.

Markus

 
 
-----Original message-----
> From:Mads Tomasgård Bjørgan <m...@dips.no>
> Sent: Thursday 30th June 2016 10:52
> To: solr-user@lucene.apache.org
> Subject: Solr node crashes while indexing - Too many open files
> 
> Hello,
> We're indexing a large set of files using Solr 6.1.0, running a SolrCloud by 
> utilizing ZooKeeper 3.4.8.
> 
> We have two ensembles - and both clusters are running on three of their own 
> respective VMs (CentOS 7). We first thought the error was due to CDCR - as we 
> were trying to index a large amount of documents which had to be replicated 
> to the target cluster. However, we got the same error even after turning of 
> CDCR - which indicates CDCR wasn't the problem after all.
> 
> After indexing between 20 000 to 35 000 documents to the source cluster does 
> the File Descriptor Count reach 4096 for one of the solr-nodes - and the 
> respective node crashes. The count grows quite linearly as time goes. The 
> remaining 2 nodes in the cluster is not affected at all, and their logs had 
> no relevant posts.  We found the following errors for the crashing node in 
> its log:
> 
> 2016-06-30 08:23:12.459 ERROR 
> (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1
>  x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) 
> [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] 
> o.a.s.u.StreamingSolrClients error
> java.net.SocketException: Too many open files
>                 (...)
> 2016-06-30 08:23:12.460 ERROR 
> (updateExecutor-2-thread-22-processing-https:////10.0.106.168:443//solr//DIPS_shard3_replica1
>  x:DIPS_shard1_replica1 r:core_node1 n:10.0.106.115:443_solr s:shard1 c:DIPS) 
> [c:DIPS s:shard1 r:core_node1 x:DIPS_shard1_replica1] 
> o.a.s.u.StreamingSolrClients error
> java.net.SocketException: Too many open files
>                 (...)
> 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 
> x:DIPS_shard1_replica1] o.a.s.h.RequestHandlerBase 
> org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  2 Async exceptions during distributed update:
> Too many open files
> Too many open files
>                 (...)
> 2016-06-30 08:23:12.461 INFO  (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 
> x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1]  
> webapp=/solr path=/update params={version=2.2} status=-1 QTime=5
> 2016-06-30 08:23:12.461 ERROR (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 
> x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall 
> null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
>  2 Async exceptions during distributed update:
> Too many open files
> Too many open files
>                 (....)
> 
> 2016-06-30 08:23:12.461 WARN  (qtp314337396-18) [c:DIPS s:shard1 r:core_node1 
> x:DIPS_shard1_replica1] o.a.s.s.HttpSolrCall invalid return code: -1
> 2016-06-30 08:23:38.108 INFO  (qtp314337396-20) [c:DIPS s:shard1 r:core_node1 
> x:DIPS_shard1_replica1] o.a.s.c.S.Request [DIPS_shard1_replica1]  
> webapp=/solr path=/select 
> params={df=_text_&distrib=false&fl=id&fl=score&shards.purpose=4&start=0&fsv=true&shard.url=https://10.0.106.115:443/solr/DIPS_shard1_replica1/&rows=10&version=2&q=*:*&NOW=1467275018057&isShard=true&wt=javabin&_=1467275017220}
>  hits=30218 status=0 QTime=1
> 
> Running netstat -n -p on the VM that yields the exceptions reveals that there 
> is at least 1 800 TCP connections (not counted how many - the netstat command 
> filled the entire PuTTY window yielding 2 000 lines) waiting to be closed:
> tcp6      70      0 10.0.106.115:34531      10.0.106.114:443        
> CLOSE_WAIT  21658/java
> We're running the SolrCloud on 443, and the IP's belong to the VMs. We also 
> tried adjusting the ulimit for the machine to 100 000 - without any results..
> 
> Greetings,
> Mads
> 

Reply via email to