Thanks you for your insight Shawn, they are always valuable. Question, if I wait to the very end to issue a commit, wouldn't that mean I could lose everything if there was an OOM or some other server issue? I don't have any commit setting set in my solrconfig.xml.
Steve On Wed, Mar 9, 2016 at 8:32 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 3/9/2016 6:10 PM, Steven White wrote: > > I'm indexing about 1 billion records (each are small Solr doc, no more > than > > 20 bytes each). The logic is basically as follows: > > > > while (data-of-1-billion) { > > read-1000-items from DB > > at-100-items send 100 items to Solr: i.e.: > > solrConnection.add(docs); > > } > > solrConnection.commit() > > > > I'm seeing the following expection from SolrJ: > > > > org.apache.solr.client.solrj.SolrServerException: Timeout occured while > > waiting response from server at: http://localhost:8983/solr/test_data > <snip> > > Which tells me it took Solr a bit over 5 sec. to complete the commit. > > > > Now when I created the Solr connection, I used 5 seconds like so: > > > > solrClient.setConnectionTimeout(5000; > > solrClient.setSoTimeout(5000); > > > > Two questions: > > > > 1) Is the time out error because of my use of 5000? > > 2) Should I be calling "solrConnection.commit()" every now and than > inside > > the loop? > > Yes, this problem is happening because you set the SoTimeout value to 5 > seconds. This is an inactivity timeout on the TCP socket. It's not > clear whether the problem happened on the commit operation or on the add > operation -- it could be either. > > Your SoTimeout value should either remain unset, or should be set to > something *significantly* longer than you ever expect the request to > take. I would suggest something between five and fifteen minutes. I > use fifteen minutes. This is long enough that it should only be reached > if there's a real problem, but short enough that my build program will > not hang indefinitely, and will have an opportunity to send me email to > tell me there's a problem. > > I would suggest that you don't do *any* commits until the end of the > loop -- after all one billion docs have been indexed. If you want to do > them in your loop, set up something that will do them far less > frequently, perhaps every 100 times through the loop. You could include > a commitWithin parameter on the add request instead of sending actual > commits, which I would recommend you set to a fairly large value. I > would use at least five minutes, but never less than one minute. > Alternately, you could configure autoSoftCommit in your solrconfig.xml > file. I would recommend a maxTime value on that config of at least five > minutes. > > Also, consider increasing your batch size to something larger than 100 > or 1000. Use 10000 or more. With 20 byte documents, you could send a > LOT of documents in each batch without worrying too much about memory. > > Regardless of what else you do with commits, if you're running at least > Solr 4.0, your solrconfig.xml file should include an autoCommit section > configured with openSearcher set to false and a maxTime between one and > five minutes. > > By now, I hope you've seen a recommendation to read this blog post: > > > http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ > > Thanks, > Shawn > >