solr 1.4 full index replication not closing previous index file handles on slaves

Nicholas Letourneau Thu, 21 Jan 2010 12:05:18 -0800

Description of our setup: 

We rebuild our entire multi-core index nightly and have it on a master server. 
Replication is always triggered manually, and a post replication cleaning 
script is run to remove the previous days index and allow enough drive space 
for the following day.


After the cleaning script is run there are always deleted file handles open by 
solr.

example lsof:

/data/solr/shards/19/data/index.20100120020120/_1n.tis (deleted)
/data/solr/shards/19/data/index.20100120020120/_1n.frq (deleted)
/data/solr/shards/19/data/index.20100120020120/_1n.prx (deleted)
/data/solr/shards/19/data/index.20100120020120/_0.fdt (deleted)
/data/solr/shards/19/data/index.20100120020120/_0.fdx (deleted)
/data/solr/shards/19/data/index.20100120020120/_0.tvx (deleted)
/data/solr/shards/19/data/index.20100120020120/_0.tvd (deleted)
/data/solr/shards/19/data/index.20100120020120/_0.tvf (deleted)
/data/solr/shards/19/data/index.20100120020120/_1n.nrm (deleted)
/data/solr/shards/19/data/index.20100121084639/_1m.tis
/data/solr/shards/19/data/index.20100121084639/_1m.frq
/data/solr/shards/19/data/index.20100121084639/_1m.prx
/data/solr/shards/19/data/index.20100121084639/_0.fdt
/data/solr/shards/19/data/index.20100121084639/_0.fdx
/data/solr/shards/19/data/index.20100121084639/_0.tvx
/data/solr/shards/19/data/index.20100121084639/_0.tvd
/data/solr/shards/19/data/index.20100121084639/_0.tvf
/data/solr/shards/19/data/index.20100121084639/_1m.nrm

These files not being closed cause the drive usage to continually go up day 
after day until at some point we finally need to restart solr in order to 
reclaim the space.

I have tried committing/optimizing the slave core, reopenReaders set to true 
and false, changing lockType to none, deletionPolicy to 0 and 1 for (not that 
expected these things to solve it in this scenario but i was grasping it straws 
trying to find anything loosely related) also reloading the core from 
coreadmin, and adding a single document to the core and then committing. We 
have also tried changing the cleaning script to run an hour after replication, 
none of these things have changed anything.

Any suggestions as to what might be the cause? Any specific source I should 
take a look at in lucene/solr to get a better understanding of how it handles 
the the switch to a new index during replication so I can find the root cause 
of the file handles not being released?

Thanks
Nicholas Letourneau
Software Engineer
Trulia.com

solr 1.4 full index replication not closing previous index file handles on slaves

Reply via email to