From your experience with your application, how big is the delta for query
time before and after a typical weekly optimize? 50%? 20%? 2%?

-- Jack Krupansky

-----Original Message----- From: Shawn Heisey
Sent: Tuesday, November 27, 2012 9:47 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4, optimizing while doing other updates?

On 11/27/2012 5:46 AM, Erick Erickson wrote:
To see how much of an issue it is, look at the admin>>statistics page. The
delta between numDocs and maxDocs is the number of non-expunged deletes in
your index. That may ease your temptation to, as Walter says, turn that
knob..

I wrote a status servlet that gives me the number of deleted documents
on all my index shards, along with other useful info.  It gathers stats
mbean info from all my shards into one convenient location.  Here you
can see a screenshot of the status page.  The production systems are
3.5.0, the dev system is a 4.1 snapshot checked out 2012/11/26:

http://dl.dropbox.com/u/97770508/statuspage.png

This is a quiet week for our system ... the shard that will be optimized
tonight currently has 13272 deleted documents. Normally that would be
much higher.  An older version of the status page includes the number of
segments, but I haven't seen a need for that so far.

For the large shards (13 million docs, 22GB in 3.5.0), I never see any
merging from just doing updates/deletes.  It takes about ten minutes to
optimize one of those shards.  Currently, my indexing program postpones
all changes to those shards during the large optimize, only allowing new
document inserts (which all go to the tiny shard) to happen.  With
Solr4, I think I can eliminate that postponement and not worry.

On the tiny shard, optimizing usually only takes about ten seconds, and
my indexing system is otherwise idle for 50-59 seconds out of every
minute, so doing it once an hour isn't hurting me.  Because it runs so
fast, I do that optimize in the same thread as the updates.

I have looked into the possibility of doing a commit with
ExpungeDeletes, without an optimize.  It doesn't work for me.  The
percentage of deleted documents in my indexes is almost never high
enough to trigger the expunge, and to my knowledge, Solr currently
doesn't have a config knob to change the percentage.  If I haven't
already filed a jira for such a configuration knob, I will.  I would
honestly like to avoid doing full optimizes, but there is currently no
other way for me to get rid of deletes.

Thanks,
Shawn

Reply via email to