Description of our setup: We rebuild our entire multi-core index nightly and have it on a master server. Replication is always triggered manually, and a post replication cleaning script is run to remove the previous days index and allow enough drive space for the following day.
After the cleaning script is run there are always deleted file handles open by solr. example lsof: /data/solr/shards/19/data/index.20100120020120/_1n.tis (deleted) /data/solr/shards/19/data/index.20100120020120/_1n.frq (deleted) /data/solr/shards/19/data/index.20100120020120/_1n.prx (deleted) /data/solr/shards/19/data/index.20100120020120/_0.fdt (deleted) /data/solr/shards/19/data/index.20100120020120/_0.fdx (deleted) /data/solr/shards/19/data/index.20100120020120/_0.tvx (deleted) /data/solr/shards/19/data/index.20100120020120/_0.tvd (deleted) /data/solr/shards/19/data/index.20100120020120/_0.tvf (deleted) /data/solr/shards/19/data/index.20100120020120/_1n.nrm (deleted) /data/solr/shards/19/data/index.20100121084639/_1m.tis /data/solr/shards/19/data/index.20100121084639/_1m.frq /data/solr/shards/19/data/index.20100121084639/_1m.prx /data/solr/shards/19/data/index.20100121084639/_0.fdt /data/solr/shards/19/data/index.20100121084639/_0.fdx /data/solr/shards/19/data/index.20100121084639/_0.tvx /data/solr/shards/19/data/index.20100121084639/_0.tvd /data/solr/shards/19/data/index.20100121084639/_0.tvf /data/solr/shards/19/data/index.20100121084639/_1m.nrm These files not being closed cause the drive usage to continually go up day after day until at some point we finally need to restart solr in order to reclaim the space. I have tried committing/optimizing the slave core, reopenReaders set to true and false, changing lockType to none, deletionPolicy to 0 and 1 for (not that expected these things to solve it in this scenario but i was grasping it straws trying to find anything loosely related) also reloading the core from coreadmin, and adding a single document to the core and then committing. We have also tried changing the cleaning script to run an hour after replication, none of these things have changed anything. Any suggestions as to what might be the cause? Any specific source I should take a look at in lucene/solr to get a better understanding of how it handles the the switch to a new index during replication so I can find the root cause of the file handles not being released? Thanks Nicholas Letourneau Software Engineer Trulia.com