Hi, folks,
I am using Solr 1.3 pretty successfully, but am running into an issue that hits once in a long while. I'm still using 1.3 since I have some custom code I will have to port forward to 1.4. My basic setup is that I have data sources continually pushing data into Solr, around 20K adds per day. The index is currently around 100G, stored on local disk on a fast linux server. I'm trying to make new docs searchable as quickly as possible, so I currently have autocommit set to 15s. I originally had 3s but that seems to be a little too unstable. I never optimize the index since optimize will lock things up solid for 2 hours, dropping docs until the optimize completes. I'm using the default segment merging settings. Every once in a while I'm getting a socket timeout when trying to add a document. I traced it to a 20s timeout and then found the corresponding point in the Solr log. Jan 13, 2010 2:59:15 PM org.apache.solr.core.SolrCore execute INFO: [tales] webapp=/solr path=/update params={} status=0 QTime=2 Jan 13, 2010 2:59:15 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: start commit(optimize=false,waitFlush=true,waitSearcher=true) Jan 13, 2010 2:59:56 PM org.apache.solr.search.SolrIndexSearcher <init> INFO: Opening searc...@26e926e9 main Jan 13, 2010 2:59:56 PM org.apache.solr.update.DirectUpdateHandler2 commit INFO: end_commit_flush Solr locked up for 41 seconds here while doing some of the commit work. So, I have a few questions. Is this related to GC? Does Solr always lock up when merging segments and I just have to live with losing the doc I want to add? Is there a timeout that would guarantee me a write success? Should I just retry in this situation? If so, how do I distinguish between this and Solr just being down? I already have had issues in the past with too many files open, so increasing the merge factor isn't an option. On a related note, I had previously asked about optimizing and was told that segment merging would take care of cleaning up deleted docs. However, I have the following stats for my index: numDocs : 2791091 maxDoc : 4811416 My understanding is that numDocs is the docs being searched and maxDoc is the number of docs including ones that will disappear after optimization. How do I get this cleanup without using optimize, since it locks up Solr for multiple hours. I'm deleting old docs daily as well. Thanks for all the help, Jerry