Hi,

I've been running a SolrCloud setup running SOLR 4.4 consisting of 3 nodes for 
some time. The cloud is hosting about 40 small collections that receive updates 
once a day. The collections are using different shard and replication 
configurations (varying from 2 shards without replication to 2 shard with 3 
replicas).

After running Tomcat for a couple of weeks, I notice the number of open files 
is dramatically increasing. Most of those files are deleted tlog files that 
SOLR keeps open:

eric@node1:/ # lsof -np 16810 | grep deleted | wc -l
36345

Those files are no longer on disk, but SOLR still has a handle open. My disk 
use is going through the roof. 6GB is currently 'in use' by deleted but still 
open files. When I restart Tomcat, the space is freed and it starts all over 
again. All of my nodes experience this behavior.

First I thought it had something to do with the lack of commits. But it happens 
on all my collections, even the ones with fast autoCommit:

    <autoCommit>
      <maxDocs>5000</maxDocs>
      <maxTime>120000</maxTime>
      <openSearcher>false</openSearcher>
    </autoCommit>

My update process always triggers a commit or rollback and updates are showing 
up correctly.

I read something about SOLR having TCP connections in CLOSE_WAIT. The only 
CLOSE_WAIT connection I see are between the nodes. And there are only about 10 
of them. Those connections can't be causing 36k open files, right?

Any suggestions/tips? At the moment, I have to restart my leader every couple 
of weeks and that's not really something I would like to do :)

Best regards,
Eric Bus

Reply via email to