Hi Shawn, Thanks for the reply!
Yes, from the application I cannot catch any errors. I check the log for errors and check the index to see if the correct number of records are built in. I am most likely going to rewrite the code to HttpSolrServer and multi-thread myself. It seems like ConcurrentUpdateSolrServer is not a good Server to use since it swallows the exceptions and the application cannot catch them and retry if needed. Just out of curiosity, what are some good use scenarios for the ConcurrentUpdateSolrServer? I've read a lot online about this issue, so I'm don't see the value of providing the ConcurrentUpdateSolrServer when I see a lot of recommendations for not using it. Secondly, the sockettimeout is definitely something that I suspected. We use Solrj's internally managed client. I couldn't find any info on if there is a default socket timeout set on that. I think by default it's set to 0. I tried to set the sockettimeout to 0 explicitly (which I think means no timeout), and it didn't help the situation. My code is below: public void setSolrServer(SolrServer solrServer) { this.solrServer = solrServer; ((ConcurrentUpdateSolrServer)this.solrServer).setSoTimeout(0); //don't timeout } Is this the correct way of setting socketTimeout? Did I do this wrong? Maybe setting it to 0 is incorrect, and instead I need to set it to a very large number (like you suggested)? Thanks! Rebecca Tang Applications Developer, UCSF CKM Legacy Tobacco Document Library <legacy.library.ucsf.edu/> E: rebecca.t...@ucsf.edu On 6/13/14 11:57 AM, "Shawn Heisey" <s...@elyograg.org> wrote: >On 6/13/2014 12:06 PM, Tang, Rebecca wrote: >> I've been working with this issue for a while and I really don¹t know >>what the root cause is. Any insight would be great! >> >> I have 14 million records in a mysql DB. I grab 100,000 records from >>the DB at a time and then use ConcurrentUpdateSolrServer (with queue >>size = 50 and thread count = 4 and using the internally managed solr >>client) to write the documents to the solr index. > >A side note, not directly related to your problem: >ConcurrentUpdateSolrServer will swallow all indexing exceptions. In >real terms, this means that you will *never* be notified that anything >failed - from the point of view of your SolrJ application, indexing will >always succeed, even if your Solr server is completely powered off. > >Instead of using ConcurrentUpdateSolrServer, use HttpSolrServer and >configure your application to do indexing with several threads. >HttpSolrServer is completely threadsafe. > >> If I build metadata only (I.e. Only from DB to Solr), then the index >>build takes 4 hrs with no errors. >> >> But if I build metadata + ocr text (ocr text is stored on the file >>system and can be very large), then the index build takes 15 16 hrs >>and often times I get a few early EOF errors on the Solr server. >> From Solr.log: >> INFO - 2014-06-13 06:28:27.113; >>org.apache.solr.update.processor.LogUpdateProcessor; [ltdl3testperf] >>webapp=/solr path=/update params={wt=javabin&version=2} {add=[trpy0136 >>(1470801743195406336), nfhc0136 (1470801743199600640), sfhc0136 >>(1470801743205892096), kghc0136 (1470801743218475008), zfhc0136 >>(1470801743220572160), jghc0136 (1470801743237349376), rghc0136 >>(1470801743268806656), ffhc0136 (1470801743270903808), pghc0136 >>(1470801743285583872), sghc0136 (1470801743286632448), ... (14165 >>adds)]} 0 260102 >> ERROR - 2014-06-13 06:28:27.114; org.apache.solr.common.SolrException; >>java.lang.RuntimeException: [was class >>org.eclipse.jetty.io.EofException] early EOF >> at >>com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java: >>18) > >EofException from Jetty means one specific thing: The client software >disconnected before Solr was finished with the request and sent its >response. Chances are good that this is because of a configured socket >timeout on your SolrJ client or its HttpClient. This might have been >done with the setSoTimeout method on the server object. > >If you must configure a socket timeout, make it VERY long -- longer than >a single request is going to take, which often means several minutes. > >Thanks, >Shawn >