Hi, folks,
I am using Solr 1.3 pretty successfully, but am running into an issue that
hits once in a long while. I'm still using 1.3 since I have some custom
code I will have to port forward to 1.4.
My basic setup is that I have data sources continually pushing data into
Solr, around 20K adds per day. The index is currently around 100G, stored
on local disk on a fast linux server. I'm trying to make new docs
searchable as quickly as possible, so I currently have autocommit set to
15s. I originally had 3s but that seems to be a little too unstable. I
never optimize the index since optimize will lock things up solid for 2
hours, dropping docs until the optimize completes. I'm using the default
segment merging settings.
Every once in a while I'm getting a socket timeout when trying to add a
document. I traced it to a 20s timeout and then found the corresponding
point in the Solr log.
Jan 13, 2010 2:59:15 PM org.apache.solr.core.SolrCore execute
INFO: [tales] webapp=/solr path=/update params={} status=0 QTime=2
Jan 13, 2010 2:59:15 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true)
Jan 13, 2010 2:59:56 PM org.apache.solr.search.SolrIndexSearcher <init>
INFO: Opening searc...@26e926e9 main
Jan 13, 2010 2:59:56 PM org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
Solr locked up for 41 seconds here while doing some of the commit work.
So, I have a few questions.
Is this related to GC?
Does Solr always lock up when merging segments and I just have to live with
losing the doc I want to add?
Is there a timeout that would guarantee me a write success?
Should I just retry in this situation? If so, how do I distinguish between
this and Solr just being down?
I already have had issues in the past with too many files open, so
increasing the merge factor isn't an option.
On a related note, I had previously asked about optimizing and was told
that segment merging would take care of cleaning up deleted docs. However,
I have the following stats for my index:
numDocs : 2791091
maxDoc : 4811416
My understanding is that numDocs is the docs being searched and maxDoc is
the number of docs including ones that will disappear after optimization.
How do I get this cleanup without using optimize, since it locks up Solr
for multiple hours. I'm deleting old docs daily as well.
Thanks for all the help,
Jerry