Hi Erick! Thanks for the reply. When I call server.add() it is just to add a single document.
But, still, I think you might be correct about the size of the ultimate request. I decided to grab the bull by the horns by instantiating my own HttpClient and, in so doing, my first run changed the following parameters, SOLR_HTTP_THREAD_COUNT=4 SOLR_MAX_BUFFERED_DOCS=10000 SOLR_MAX_CONNECTIONS=256 SOLR_MAX_CONNECTIONS_PER_HOST=128 SOLR_CONNECTION_TIMEOUT=0 SOLR_SO_TIMEOUT=0 I doubled the number of emptying threads, reduced the size of the request buffer 5x, increased the connection limits and set the timeouts to infinite. (I'm not actually sure what the defaults for the timeouts were since I didn't see them in the Solr code and didn't track it down.) Anyway, the good news is that this combination of parameters worked. The bad news is that I don't know whether it was resolved by changing one or more of the parameters. But, regardless, I think the whole experiment verifies your thinking that the request was too big! Thanks again!! :) Jim Beale Lead Developer hibu.com 2201 Renaissance Boulevard, King of Prussia, PA, 19406 Office: 610-879-3864 Mobile: 610-220-3067 -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, July 19, 2013 8:08 AM To: solr-user@lucene.apache.org Subject: Re: Indexing into SolrCloud Usually EOF errors indicate that the packet you're sending are too big. Wait, though. 50K is not buffered docs, I think it's buffered _requests_. So you're creating a queue that's ginormous and asking 2 threads to empty it. But that's not really the issue I suspect. How many documents are you adding at a time when you call server.add? I.e. are you using sever.add(doc) or server.add(doclist)? If the latter and you're adding a bunch of docs, try lowering that number. If you're sending one doc at a time I'm on the wrong track. Best Erick On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) <jim.be...@hibu.com> wrote: > Hey folks, > > I've been migrating an application which indexes about 15M documents from > straight-up Lucene into SolrCloud. We've set up 5 Solr instances with a 3 > zookeeper ensemble using HAProxy for load balancing. The documents are > processed on a quad core machine with 6 threads and indexed into SolrCloud > through HAProxy using ConcurrentUpdateSolrServer in order to batch the > updates. The indexing box is heavily-loaded during indexing but I don't > think it is so bad that it would cause issues. > > I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy > 1.4.22. > > I've been accepting the default HttpClient with 50K buffered docs and 2 > threads, i.e., > > int solrMaxBufferedDocs = 50000; > int solrThreadCount = 2; > solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, > solrMaxBufferedDocs, solrThreadCount); > > autoCommit is configured in the solrconfig as follows: > > <autoCommit> > <maxTime>600000</maxTime> > <maxDocs>500000</maxDocs> > <openSearcher>false</openSearcher> > </autoCommit> > > I'm getting the following errors on the client and server sides respectively: > > Client side: > > 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO > SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught > when processing request: Software caused connection abort: socket write error > 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO > SystemDefaultHttpClient - Retrying request > 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO > SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught > when processing request: Software caused connection abort: socket write error > 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO > SystemDefaultHttpClient - Retrying request > > Server side: > > 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore รข > java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] > early EOF > at > com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18) > at > com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731) > at > com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657) > at > com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809) > at > org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393) > > When I disabled autoCommit on the server side, I didn't see any errors there > but I still get the issue client-side after about 2 million documents - which > is about 45 minutes. > > Has anyone seen this issue before? I couldn't find anything useful on the > usual places. > > I suppose I could setup wireshark to see what is happening but I'm hoping > that someone has a better suggestion. > > Thanks in advance for any help! > > > Best regards, > Jim Beale > > hibu.com > 2201 Renaissance Boulevard, King of Prussia, PA, 19406 > Office: 610-879-3864 > Mobile: 610-220-3067 > > The information contained in this email message, including any attachments, > is intended solely for use by the individual or entity named above and may be > confidential. If the reader of this message is not the intended recipient, > you are hereby notified that you must not read, use, disclose, distribute or > copy any part of this communication. If you have received this communication > in error, please immediately notify me by email and destroy the original > message, including any attachments. Thank you. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > The information contained in this email message, including any attachments, > is intended solely for use by the individual or entity named above and may be > confidential. If the reader of this message is not the intended recipient, > you are hereby notified that you must not read, use, disclose, distribute or > copy any part of this communication. If you have received this communication > in error, please immediately notify me by email and destroy the original > message, including any attachments. Thank you. The information contained in this email message, including any attachments, is intended solely for use by the individual or entity named above and may be confidential. If the reader of this message is not the intended recipient, you are hereby notified that you must not read, use, disclose, distribute or copy any part of this communication. If you have received this communication in error, please immediately notify me by email and destroy the original message, including any attachments. Thank you.