Thanks you for your insight Shawn, they are always valuable.

Question, if I wait to the very end to issue a commit, wouldn't that mean I
could lose everything if there was an OOM or some other server issue?  I
don't have any commit setting set in my solrconfig.xml.

Steve

On Wed, Mar 9, 2016 at 8:32 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 3/9/2016 6:10 PM, Steven White wrote:
> > I'm indexing about 1 billion records (each are small Solr doc, no more
> than
> > 20 bytes each).  The logic is basically as follows:
> >
> >     while (data-of-1-billion) {
> >         read-1000-items from DB
> >         at-100-items send 100 items to Solr: i.e.:
> > solrConnection.add(docs);
> >     }
> >     solrConnection.commit()
> >
> > I'm seeing the following expection from SolrJ:
> >
> > org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> > waiting response from server at: http://localhost:8983/solr/test_data
> <snip>
> > Which tells me it took Solr a bit over 5 sec. to complete the commit.
> >
> > Now when I created the Solr connection, I used 5 seconds like so:
> >
> >     solrClient.setConnectionTimeout(5000;
> >   solrClient.setSoTimeout(5000);
> >
> > Two questions:
> >
> > 1) Is the time out error because of my use of 5000?
> > 2) Should I be calling "solrConnection.commit()" every now and than
> inside
> > the loop?
>
> Yes, this problem is happening because you set the SoTimeout value to 5
> seconds.  This is an inactivity timeout on the TCP socket.  It's not
> clear whether the problem happened on the commit operation or on the add
> operation -- it could be either.
>
> Your SoTimeout value should either remain unset, or should be set to
> something *significantly* longer than you ever expect the request to
> take.  I would suggest something between five and fifteen minutes.  I
> use fifteen minutes.  This is long enough that it should only be reached
> if there's a real problem, but short enough that my build program will
> not hang indefinitely, and will have an opportunity to send me email to
> tell me there's a problem.
>
> I would suggest that you don't do *any* commits until the end of the
> loop -- after all one billion docs have been indexed.  If you want to do
> them in your loop, set up something that will do them far less
> frequently, perhaps every 100 times through the loop.  You could include
> a commitWithin parameter on the add request instead of sending actual
> commits, which I would recommend you set to a fairly large value.  I
> would use at least five minutes, but never less than one minute.
> Alternately, you could configure autoSoftCommit in your solrconfig.xml
> file.  I would recommend a maxTime value on that config of at least five
> minutes.
>
> Also, consider increasing your batch size to something larger than 100
> or 1000.  Use 10000 or more.  With 20 byte documents, you could send a
> LOT of documents in each batch without worrying too much about memory.
>
> Regardless of what else you do with commits, if you're running at least
> Solr 4.0, your solrconfig.xml file should include an autoCommit section
> configured with openSearcher set to false and a maxTime between one and
> five minutes.
>
> By now, I hope you've seen a recommendation to read this blog post:
>
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>
>

Reply via email to