On 3/9/2016 6:10 PM, Steven White wrote:
> I'm indexing about 1 billion records (each are small Solr doc, no more than
> 20 bytes each).  The logic is basically as follows:
>
>     while (data-of-1-billion) {
>         read-1000-items from DB
>         at-100-items send 100 items to Solr: i.e.:
> solrConnection.add(docs);
>     }
>     solrConnection.commit()
>
> I'm seeing the following expection from SolrJ:
>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: http://localhost:8983/solr/test_data
<snip>
> Which tells me it took Solr a bit over 5 sec. to complete the commit.
>
> Now when I created the Solr connection, I used 5 seconds like so:
>
>     solrClient.setConnectionTimeout(5000;
>   solrClient.setSoTimeout(5000);
>
> Two questions:
>
> 1) Is the time out error because of my use of 5000?
> 2) Should I be calling "solrConnection.commit()" every now and than inside
> the loop?

Yes, this problem is happening because you set the SoTimeout value to 5
seconds.  This is an inactivity timeout on the TCP socket.  It's not
clear whether the problem happened on the commit operation or on the add
operation -- it could be either.

Your SoTimeout value should either remain unset, or should be set to
something *significantly* longer than you ever expect the request to
take.  I would suggest something between five and fifteen minutes.  I
use fifteen minutes.  This is long enough that it should only be reached
if there's a real problem, but short enough that my build program will
not hang indefinitely, and will have an opportunity to send me email to
tell me there's a problem.

I would suggest that you don't do *any* commits until the end of the
loop -- after all one billion docs have been indexed.  If you want to do
them in your loop, set up something that will do them far less
frequently, perhaps every 100 times through the loop.  You could include
a commitWithin parameter on the add request instead of sending actual
commits, which I would recommend you set to a fairly large value.  I
would use at least five minutes, but never less than one minute. 
Alternately, you could configure autoSoftCommit in your solrconfig.xml
file.  I would recommend a maxTime value on that config of at least five
minutes.

Also, consider increasing your batch size to something larger than 100
or 1000.  Use 10000 or more.  With 20 byte documents, you could send a
LOT of documents in each batch without worrying too much about memory.

Regardless of what else you do with commits, if you're running at least
Solr 4.0, your solrconfig.xml file should include an autoCommit section
configured with openSearcher set to false and a maxTime between one and
five minutes.

By now, I hope you've seen a recommendation to read this blog post:

http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn

Reply via email to