NP, glad I was able to help! Erick
On Fri, Jul 19, 2013 at 11:07 AM, Beale, Jim (US-KOP) <jim.be...@hibu.com> wrote: > Hi Erick! > > Thanks for the reply. When I call server.add() it is just to add a single > document. > > But, still, I think you might be correct about the size of the ultimate > request. I decided to grab the bull by the horns by instantiating my own > HttpClient and, in so doing, my first run changed the following parameters, > > SOLR_HTTP_THREAD_COUNT=4 > SOLR_MAX_BUFFERED_DOCS=10000 > SOLR_MAX_CONNECTIONS=256 > SOLR_MAX_CONNECTIONS_PER_HOST=128 > SOLR_CONNECTION_TIMEOUT=0 > SOLR_SO_TIMEOUT=0 > > I doubled the number of emptying threads, reduced the size of the request > buffer 5x, increased the connection limits and set the timeouts to infinite. > (I'm not actually sure what the defaults for the timeouts were since I didn't > see them in the Solr code and didn't track it down.) > > Anyway, the good news is that this combination of parameters worked. The bad > news is that I don't know whether it was resolved by changing one or more of > the parameters. > > But, regardless, I think the whole experiment verifies your thinking that the > request was too big! > > Thanks again!! :) > > > Jim Beale > Lead Developer > hibu.com > 2201 Renaissance Boulevard, King of Prussia, PA, 19406 > Office: 610-879-3864 > Mobile: 610-220-3067 > > > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, July 19, 2013 8:08 AM > To: solr-user@lucene.apache.org > Subject: Re: Indexing into SolrCloud > > Usually EOF errors indicate that the packet you're sending are too big. > > Wait, though. 50K is not buffered docs, I think it's buffered _requests_. > So you're creating a queue that's ginormous and asking 2 threads to empty it. > > But that's not really the issue I suspect. How many documents are you adding > at a time when you call server.add? I.e. are you using sever.add(doc) or > server.add(doclist)? If the latter and you're adding a bunch of docs, try > lowering that number. If you're sending one doc at a time I'm on the > wrong track. > > Best > Erick > > On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) <jim.be...@hibu.com> > wrote: >> Hey folks, >> >> I've been migrating an application which indexes about 15M documents from >> straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 >> zookeeper ensemble using HAProxy for load balancing. The documents are >> processed on a quad core machine with 6 threads and indexed into SolrCloud >> through HAProxy using ConcurrentUpdateSolrServer in order to batch the >> updates. The indexing box is heavily-loaded during indexing but I don't >> think it is so bad that it would cause issues. >> >> I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy >> 1.4.22. >> >> I've been accepting the default HttpClient with 50K buffered docs and 2 >> threads, i.e., >> >> int solrMaxBufferedDocs = 50000; >> int solrThreadCount = 2; >> solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, >> solrMaxBufferedDocs, solrThreadCount); >> >> autoCommit is configured in the solrconfig as follows: >> >> <autoCommit> >> <maxTime>600000</maxTime> >> <maxDocs>500000</maxDocs> >> <openSearcher>false</openSearcher> >> </autoCommit> >> >> I'm getting the following errors on the client and server sides respectively: >> >> Client side: >> >> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO >> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught >> when processing request: Software caused connection abort: socket write error >> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO >> SystemDefaultHttpClient - Retrying request >> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO >> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught >> when processing request: Software caused connection abort: socket write error >> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO >> SystemDefaultHttpClient - Retrying request >> >> Server side: >> >> 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore รข >> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] >> early EOF >> at >> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) >> at >> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) >> at >> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) >> at >> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) >> at >> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) >> >> When I disabled autoCommit on the server side, I didn't see any errors there >> but I still get the issue client-side after about 2 million documents - >> which is about 45 minutes. >> >> Has anyone seen this issue before? I couldn't find anything useful on the >> usual places. >> >> I suppose I could setup wireshark to see what is happening but I'm hoping >> that someone has a better suggestion. >> >> Thanks in advance for any help! >> >> >> Best regards, >> Jim Beale >> >> hibu.com >> 2201 Renaissance Boulevard, King of Prussia, PA, 19406 >> Office: 610-879-3864 >> Mobile: 610-220-3067 >> >> The information contained in this email message, including any attachments, >> is intended solely for use by the individual or entity named above and may >> be confidential. If the reader of this message is not the intended >> recipient, you are hereby notified that you must not read, use, disclose, >> distribute or copy any part of this communication. If you have received this >> communication in error, please immediately notify me by email and destroy >> the original message, including any attachments. Thank you. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> The information contained in this email message, including any attachments, >> is intended solely for use by the individual or entity named above and may >> be confidential. If the reader of this message is not the intended >> recipient, you are hereby notified that you must not read, use, disclose, >> distribute or copy any part of this communication. If you have received this >> communication in error, please immediately notify me by email and destroy >> the original message, including any attachments. Thank you. > The information contained in this email message, including any attachments, > is intended solely for use by the individual or entity named above and may be > confidential. If the reader of this message is not the intended recipient, > you are hereby notified that you must not read, use, disclose, distribute or > copy any part of this communication. If you have received this communication > in error, please immediately notify me by email and destroy the original > message, including any attachments. Thank you.