Hi Shawn,

Thanks for the reply!

Yes, from the application I cannot catch any errors.  I check the log for
errors and check the index to see if the correct number of records are
built in.  I am most likely going to rewrite the code to HttpSolrServer
and multi-thread myself.  It seems like ConcurrentUpdateSolrServer is not
a good Server to use since it swallows the exceptions and the application
cannot catch them and retry if needed.  Just out of curiosity, what are
some good use scenarios for the ConcurrentUpdateSolrServer?  I've read a
lot online about this issue, so I'm don't see the value of providing the
ConcurrentUpdateSolrServer when I see a lot of recommendations for not
using it.

Secondly, the sockettimeout is definitely something that I suspected.  We
use Solrj's internally managed client. I couldn't find any info on if
there is a default socket timeout set on that.  I think by default it's
set to 0.  I tried to set the sockettimeout to 0 explicitly (which I think
means no timeout), and it didn't help the situation.  My code is below:


public void setSolrServer(SolrServer solrServer) {
   this.solrServer = solrServer;
   ((ConcurrentUpdateSolrServer)this.solrServer).setSoTimeout(0);  //don't
timeout

    }



Is this the correct way of setting socketTimeout?  Did I do this wrong?
Maybe setting it to 0 is incorrect, and instead I need to set it to a very
large number (like you suggested)?

Thanks!

Rebecca Tang
Applications Developer, UCSF CKM
Legacy Tobacco Document Library <legacy.library.ucsf.edu/>
E: rebecca.t...@ucsf.edu




On 6/13/14 11:57 AM, "Shawn Heisey" <s...@elyograg.org> wrote:

>On 6/13/2014 12:06 PM, Tang, Rebecca wrote:
>> I've been working with this issue for a while and I really don¹t know
>>what the root cause is.  Any insight would be great!
>>
>> I have 14 million records in a mysql DB.  I grab 100,000 records from
>>the DB at a time and then use ConcurrentUpdateSolrServer (with queue
>>size = 50 and thread count = 4 and using the internally managed solr
>>client) to write the documents to the solr index.
>
>A side note, not directly related to your problem:
>ConcurrentUpdateSolrServer will swallow all indexing exceptions.  In
>real terms, this means that you will *never* be notified that anything
>failed - from the point of view of your SolrJ application, indexing will
>always succeed, even if your Solr server is completely powered off.
>
>Instead of using ConcurrentUpdateSolrServer, use HttpSolrServer and
>configure your application to do indexing with several threads.
>HttpSolrServer is completely threadsafe.
>
>> If I build metadata only (I.e. Only from DB to Solr), then the index
>>build takes 4 hrs with no errors.
>>
>> But if I build metadata + ocr text (ocr text is stored on the file
>>system and can be very large), then the index build takes 15 ­ 16 hrs
>>and often times I get a few early EOF errors on the Solr server.
>> From Solr.log:
>> INFO  - 2014-06-13 06:28:27.113;
>>org.apache.solr.update.processor.LogUpdateProcessor; [ltdl3testperf]
>>webapp=/solr path=/update params={wt=javabin&version=2} {add=[trpy0136
>>(1470801743195406336), nfhc0136 (1470801743199600640), sfhc0136
>>(1470801743205892096), kghc0136 (1470801743218475008), zfhc0136
>>(1470801743220572160), jghc0136 (1470801743237349376), rghc0136
>>(1470801743268806656), ffhc0136 (1470801743270903808), pghc0136
>>(1470801743285583872), sghc0136 (1470801743286632448), ... (14165
>>adds)]} 0 260102
>> ERROR - 2014-06-13 06:28:27.114; org.apache.solr.common.SolrException;
>>java.lang.RuntimeException: [was class
>>org.eclipse.jetty.io.EofException] early EOF
>>         at 
>>com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:
>>18)
>
>EofException from Jetty means one specific thing:  The client software
>disconnected before Solr was finished with the request and sent its
>response.  Chances are good that this is because of a configured socket
>timeout on your SolrJ client or its HttpClient.  This might have been
>done with the setSoTimeout method on the server object.
>
>If you must configure a socket timeout, make it VERY long -- longer than
>a single request is going to take, which often means several minutes.
>
>Thanks,
>Shawn
>


Reply via email to