Sorry, meant to forward that to another developer at work. --wunder

On Jun 13, 2014, at 1:03 PM, Walter Underwood <wun...@wunderwood.org> wrote:

> You can't, because it never reports them. We might be building with 
> HttpSolrServer instead.
> 
> wunder
> 
> On Jun 13, 2014, at 11:57 AM, Shawn Heisey <s...@elyograg.org> wrote:
> 
>> On 6/13/2014 12:06 PM, Tang, Rebecca wrote:
>>> I've been working with this issue for a while and I really don’t know what 
>>> the root cause is.  Any insight would be great!
>>> 
>>> I have 14 million records in a mysql DB.  I grab 100,000 records from the 
>>> DB at a time and then use ConcurrentUpdateSolrServer (with queue size = 50 
>>> and thread count = 4 and using the internally managed solr client) to write 
>>> the documents to the solr index.
>> 
>> A side note, not directly related to your problem:
>> ConcurrentUpdateSolrServer will swallow all indexing exceptions.  In
>> real terms, this means that you will *never* be notified that anything
>> failed - from the point of view of your SolrJ application, indexing will
>> always succeed, even if your Solr server is completely powered off.
>> 
>> Instead of using ConcurrentUpdateSolrServer, use HttpSolrServer and
>> configure your application to do indexing with several threads. 
>> HttpSolrServer is completely threadsafe.
>> 
>>> If I build metadata only (I.e. Only from DB to Solr), then the index build 
>>> takes 4 hrs with no errors.
>>> 
>>> But if I build metadata + ocr text (ocr text is stored on the file system 
>>> and can be very large), then the index build takes 15 – 16 hrs and often 
>>> times I get a few early EOF errors on the Solr server.
>>> From Solr.log:
>>> INFO  - 2014-06-13 06:28:27.113; 
>>> org.apache.solr.update.processor.LogUpdateProcessor; [ltdl3testperf] 
>>> webapp=/solr path=/update params={wt=javabin&version=2} {add=[trpy0136 
>>> (1470801743195406336), nfhc0136 (1470801743199600640), sfhc0136 
>>> (1470801743205892096), kghc0136 (1470801743218475008), zfhc0136 
>>> (1470801743220572160), jghc0136 (1470801743237349376), rghc0136 
>>> (1470801743268806656), ffhc0136 (1470801743270903808), pghc0136 
>>> (1470801743285583872), sghc0136 (1470801743286632448), ... (14165 adds)]} 0 
>>> 260102
>>> ERROR - 2014-06-13 06:28:27.114; org.apache.solr.common.SolrException; 
>>> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
>>> early EOF
>>>        at 
>>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>> 
>> EofException from Jetty means one specific thing:  The client software
>> disconnected before Solr was finished with the request and sent its
>> response.  Chances are good that this is because of a configured socket
>> timeout on your SolrJ client or its HttpClient.  This might have been
>> done with the setSoTimeout method on the server object.
>> 
>> If you must configure a socket timeout, make it VERY long -- longer than
>> a single request is going to take, which often means several minutes.
>> 
>> Thanks,
>> Shawn
>> 
> 
> --
> Walter Underwood
> wun...@wunderwood.org
> 
> 
> 

--
Walter Underwood
wun...@wunderwood.org



Reply via email to