On 11/27/2012 5:46 AM, Erick Erickson wrote:
To see how much of an issue it is, look at the admin>>statistics page. The
delta between numDocs and maxDocs is the number of non-expunged deletes in
your index. That may ease your temptation to, as Walter says, turn that
knob..

I wrote a status servlet that gives me the number of deleted documents on all my index shards, along with other useful info. It gathers stats mbean info from all my shards into one convenient location. Here you can see a screenshot of the status page. The production systems are 3.5.0, the dev system is a 4.1 snapshot checked out 2012/11/26:

http://dl.dropbox.com/u/97770508/statuspage.png

This is a quiet week for our system ... the shard that will be optimized tonight currently has 13272 deleted documents. Normally that would be much higher. An older version of the status page includes the number of segments, but I haven't seen a need for that so far.

For the large shards (13 million docs, 22GB in 3.5.0), I never see any merging from just doing updates/deletes. It takes about ten minutes to optimize one of those shards. Currently, my indexing program postpones all changes to those shards during the large optimize, only allowing new document inserts (which all go to the tiny shard) to happen. With Solr4, I think I can eliminate that postponement and not worry.

On the tiny shard, optimizing usually only takes about ten seconds, and my indexing system is otherwise idle for 50-59 seconds out of every minute, so doing it once an hour isn't hurting me. Because it runs so fast, I do that optimize in the same thread as the updates.

I have looked into the possibility of doing a commit with ExpungeDeletes, without an optimize. It doesn't work for me. The percentage of deleted documents in my indexes is almost never high enough to trigger the expunge, and to my knowledge, Solr currently doesn't have a config knob to change the percentage. If I haven't already filed a jira for such a configuration knob, I will. I would honestly like to avoid doing full optimizes, but there is currently no other way for me to get rid of deletes.

Thanks,
Shawn

Reply via email to