Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

Shawn Heisey Wed, 22 May 2013 08:46:42 -0700

On 5/22/2013 9:08 AM, Justin Babuscio wrote:

We periodically rebuild our Solr index from scratch.  We have built a
custom publisher that horizontally scales to increase write throughput.  On
a given rebuild, we will have ~60 JVMs running with 5 threads that are
actively publishing to all Solr masters.


For each thread, we instantiate one StreamingUpdateSolrServer(
QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.

Looking over all your details, you might want to try first reducing themaxFieldLength to slightly below Integer.MAX_VALUE. Try setting it to 2billion, or even something more modest, in the millions. It'stheoretically possible that the other value might be leading to anoverflow somewhere. I've been looking for evidence of this, nothing'sturned up yet.

There MIGHT be bugs in the Apache Commons libraries that SolrJ uses.The next thing I would try is upgrading those component jars in yourapplication's classpath - httpclient, commons-io, commons-codec, etc.

Upgrading to a newer SolrJ version is also a good idea. Your notesimply that you are using the default XML request writer in SolrJ. Ifthat's true, you should be able to use a 4.3 SolrJ even with an olderSolr version, which would give you a server object that's based onHttpComponents 4.x, where your current objects are based on HttpClient3.x. You would need to make adjustments in your source code. If you'renot using the default XML request writer, you can get a similar changeby using SolrJ 3.6.2.

IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ3.5 and earlier). StreamingUpdateSolrServer (and its replacement in 3.6and later, named ConcurrentUpdateSolrServer) has one glaring problem -it never informs the calling application about any errors that itencounters during indexing. It lies to you, and tells you thateverything has succeeded even when it doesn't.

The one advantage that SUSS/CUSS has over its Http sibling is that it ismulti-threaded, so it can send updates concurrently. You seem to knowenough about how it works, so I'll just say that you don't needadditional complexity that is not under your control and refuses tothrow exceptions when an error occurs. You already have a large-scaleconcurrent and multi-threaded indexing setup, so SolrJ's additionalthread handling doesn't really buy you much.


Thanks,
Shawn

Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

Reply via email to