NP, glad I was able to help!

Erick

On Fri, Jul 19, 2013 at 11:07 AM, Beale, Jim (US-KOP)
<jim.be...@hibu.com> wrote:
> Hi Erick!
>
> Thanks for the reply.  When I call server.add() it is just to add a single 
> document.
>
> But, still, I think you might be correct about the size of the ultimate 
> request.  I decided to grab the bull by the horns by instantiating my own 
> HttpClient and, in so doing, my first run changed the following parameters,
>
> SOLR_HTTP_THREAD_COUNT=4
> SOLR_MAX_BUFFERED_DOCS=10000
> SOLR_MAX_CONNECTIONS=256
> SOLR_MAX_CONNECTIONS_PER_HOST=128
> SOLR_CONNECTION_TIMEOUT=0
> SOLR_SO_TIMEOUT=0
>
> I doubled the number of emptying threads, reduced the size of the request 
> buffer 5x, increased the connection limits and set the timeouts to infinite.  
> (I'm not actually sure what the defaults for the timeouts were since I didn't 
> see them in the Solr code and didn't track it down.)
>
> Anyway, the good news is that this combination of parameters worked.  The bad 
> news is that I don't know whether it was resolved by changing one or more of 
> the parameters.
>
> But, regardless, I think the whole experiment verifies your thinking that the 
> request was too big!
>
> Thanks again!! :)
>
>
> Jim Beale
> Lead Developer
> hibu.com
> 2201 Renaissance Boulevard, King of Prussia, PA, 19406
> Office: 610-879-3864
> Mobile: 610-220-3067
>
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, July 19, 2013 8:08 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Indexing into SolrCloud
>
> Usually EOF errors indicate that the packet you're sending are too big.
>
> Wait, though. 50K is not buffered docs, I think it's buffered _requests_.
> So you're creating a queue that's ginormous and asking 2 threads to empty it.
>
> But that's not really the issue I suspect. How many documents are you adding
> at a time when you call server.add? I.e. are you using sever.add(doc) or
> server.add(doclist)? If the latter and you're adding a bunch of docs, try
> lowering that number. If you're sending one doc at a time I'm on the
> wrong track.
>
> Best
> Erick
>
> On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) <jim.be...@hibu.com> 
> wrote:
>> Hey folks,
>>
>> I've been migrating an application which indexes about 15M documents from 
>> straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
>> zookeeper ensemble using HAProxy for load balancing. The documents are 
>> processed on a quad core machine with 6 threads and indexed into SolrCloud 
>> through HAProxy using ConcurrentUpdateSolrServer in order to batch the 
>> updates.  The indexing box is heavily-loaded during indexing but I don't 
>> think it is so bad that it would cause issues.
>>
>> I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
>> 1.4.22.
>>
>> I've been accepting the default HttpClient with 50K buffered docs and 2 
>> threads, i.e.,
>>
>> int solrMaxBufferedDocs = 50000;
>> int solrThreadCount = 2;
>> solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
>> solrMaxBufferedDocs, solrThreadCount);
>>
>> autoCommit is configured in the solrconfig as follows:
>>
>>      <autoCommit>
>>        <maxTime>600000</maxTime>
>>        <maxDocs>500000</maxDocs>
>>        <openSearcher>false</openSearcher>
>>      </autoCommit>
>>
>> I'm getting the following errors on the client and server sides respectively:
>>
>> Client side:
>>
>> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
>> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
>> when processing request: Software caused connection abort: socket write error
>> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
>> SystemDefaultHttpClient - Retrying request
>> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
>> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
>> when processing request: Software caused connection abort: socket write error
>> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
>> SystemDefaultHttpClient - Retrying request
>>
>> Server side:
>>
>> 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  รข 
>> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
>> early EOF
>>         at 
>> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>>         at 
>> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>>         at 
>> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>>         at 
>> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>>         at 
>> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
>>
>> When I disabled autoCommit on the server side, I didn't see any errors there 
>> but I still get the issue client-side after about 2 million documents - 
>> which is about 45 minutes.
>>
>> Has anyone seen this issue before?  I couldn't find anything useful on the 
>> usual places.
>>
>> I suppose I could setup wireshark to see what is happening but I'm hoping 
>> that someone has a better suggestion.
>>
>> Thanks in advance for any help!
>>
>>
>> Best regards,
>> Jim Beale
>>
>> hibu.com
>> 2201 Renaissance Boulevard, King of Prussia, PA, 19406
>> Office: 610-879-3864
>> Mobile: 610-220-3067
>>
>> The information contained in this email message, including any attachments, 
>> is intended solely for use by the individual or entity named above and may 
>> be confidential. If the reader of this message is not the intended 
>> recipient, you are hereby notified that you must not read, use, disclose, 
>> distribute or copy any part of this communication. If you have received this 
>> communication in error, please immediately notify me by email and destroy 
>> the original message, including any attachments. Thank you.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>> The information contained in this email message, including any attachments, 
>> is intended solely for use by the individual or entity named above and may 
>> be confidential. If the reader of this message is not the intended 
>> recipient, you are hereby notified that you must not read, use, disclose, 
>> distribute or copy any part of this communication. If you have received this 
>> communication in error, please immediately notify me by email and destroy 
>> the original message, including any attachments. Thank you.
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.

Reply via email to