Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

Justin Babuscio Wed, 22 May 2013 08:09:01 -0700

*Problem:*

We periodically rebuild our Solr index from scratch.  We have built a
custom publisher that horizontally scales to increase write throughput.  On
a given rebuild, we will have ~60 JVMs running with 5 threads that are
actively publishing to all Solr masters.


For each thread, we instantiate one StreamingUpdateSolrServer(
QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread.

At the end of a publish cycle (we publish in smaller chunks = 5MM records),
we execute server.blockUntilFinished() on each of the 20 servers on each
thread ( 100 total ).  Before we applied a recent change, this would always
execute to completion.  There were a few hang-ups on publishes but we
consistently re-published our entire corpus in 6-7 hours.

The *problem* is that the blockUntilFinished hangs indefinitely.  From the
java thread dumps, it appears that the loop in StreamingUpdateSolrServer
thinks a runner thread is still active so it blocks (as expected).  The
other note about the java thread dump is that the active runner thread is
exactly this:


*Hung Runner Thread:*
"pool-1-thread-8" prio=3 tid=0x00000001084c0000 nid=0xfe runnable
[0xffffffff5c7fe000]
java.lang.Thread.State: RUNNABLE
 at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 - locked <0xfffffffe81dbcbe0> (a java.io.BufferedInputStream)
at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78)
 at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106)
at
org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116)
 at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413)
at
org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973)
 at
org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735)
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098)
 at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154)


Although the runner thread is reading the socket, there is absolutely no
activity on the Solr clients.  Other than the blockUntilFinished thread,
the client is basically sleeping.

*
*
*
*
***Recent Change:*

We increased the "maxFieldLength" from 10000(default) to 2147483647
(Integer.MAX_VALUE).

Given this change is server side, I don't know how this would impact adding
a new document.  I see how it would increase commit times and index size,
but don't see the relationship to hanging client adds.


*Ingest Workflow:*

1) Pull artifacts from relational database (PDF/TXT/Java bean)
2) Extract all searchable text fields -- this is where we use Tika,
independent of Solr
3) Using Solr4J client, we publish an object that is serialized to XML and
written to the master
4) execute "blockUntilFinished" for all 20 servers on each thread.

5) Autocommit set on servers at 30 minutes or 50k documents.  During
republish, 50k threshold is met first.

*
*
*Environment:*

Solr v3.5.0
20 masters
2 slaves/master = 40 slaves


*Corpus:*

We have ~100MM records, ranging in size from 50MB PDFs to 1KB TXT files.
 Our schema has an unusually large number of fields, 200.  Our index size
averages about 30GB/shards, totally 600GB.


*Releated Bugs:*

My symptoms are most related to this bug but we are not executing any
deletes so I have low confidence that it is 100% related
https://issues.apache.org/jira/browse/SOLR-1990


Although we have similar stack traces, we are only ADDING docs.


Thanks ahead for any input/help!

-- 
Justin Babuscio

Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0

Reply via email to