On 6/13/2014 12:06 PM, Tang, Rebecca wrote:
> I've been working with this issue for a while and I really don’t know what 
> the root cause is.  Any insight would be great!
>
> I have 14 million records in a mysql DB.  I grab 100,000 records from the DB 
> at a time and then use ConcurrentUpdateSolrServer (with queue size = 50 and 
> thread count = 4 and using the internally managed solr client) to write the 
> documents to the solr index.

A side note, not directly related to your problem:
ConcurrentUpdateSolrServer will swallow all indexing exceptions.  In
real terms, this means that you will *never* be notified that anything
failed - from the point of view of your SolrJ application, indexing will
always succeed, even if your Solr server is completely powered off.

Instead of using ConcurrentUpdateSolrServer, use HttpSolrServer and
configure your application to do indexing with several threads. 
HttpSolrServer is completely threadsafe.

> If I build metadata only (I.e. Only from DB to Solr), then the index build 
> takes 4 hrs with no errors.
>
> But if I build metadata + ocr text (ocr text is stored on the file system and 
> can be very large), then the index build takes 15 – 16 hrs and often times I 
> get a few early EOF errors on the Solr server.
> From Solr.log:
> INFO  - 2014-06-13 06:28:27.113; 
> org.apache.solr.update.processor.LogUpdateProcessor; [ltdl3testperf] 
> webapp=/solr path=/update params={wt=javabin&version=2} {add=[trpy0136 
> (1470801743195406336), nfhc0136 (1470801743199600640), sfhc0136 
> (1470801743205892096), kghc0136 (1470801743218475008), zfhc0136 
> (1470801743220572160), jghc0136 (1470801743237349376), rghc0136 
> (1470801743268806656), ffhc0136 (1470801743270903808), pghc0136 
> (1470801743285583872), sghc0136 (1470801743286632448), ... (14165 adds)]} 0 
> 260102
> ERROR - 2014-06-13 06:28:27.114; org.apache.solr.common.SolrException; 
> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
> early EOF
>         at 
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)

EofException from Jetty means one specific thing:  The client software
disconnected before Solr was finished with the request and sent its
response.  Chances are good that this is because of a configured socket
timeout on your SolrJ client or its HttpClient.  This might have been
done with the setSoTimeout method on the server object.

If you must configure a socket timeout, make it VERY long -- longer than
a single request is going to take, which often means several minutes.

Thanks,
Shawn

Reply via email to