RE: Indexing into SolrCloud

Beale, Jim (US-KOP) Fri, 19 Jul 2013 08:38:30 -0700

Hi Erick!

Thanks for the reply.  When I call server.add() it is just to add a single 
document.


But, still, I think you might be correct about the size of the ultimate 
request.  I decided to grab the bull by the horns by instantiating my own 
HttpClient and, in so doing, my first run changed the following parameters,

SOLR_HTTP_THREAD_COUNT=4
SOLR_MAX_BUFFERED_DOCS=10000
SOLR_MAX_CONNECTIONS=256
SOLR_MAX_CONNECTIONS_PER_HOST=128
SOLR_CONNECTION_TIMEOUT=0
SOLR_SO_TIMEOUT=0

I doubled the number of emptying threads, reduced the size of the request 
buffer 5x, increased the connection limits and set the timeouts to infinite.  
(I'm not actually sure what the defaults for the timeouts were since I didn't 
see them in the Solr code and didn't track it down.)

Anyway, the good news is that this combination of parameters worked.  The bad 
news is that I don't know whether it was resolved by changing one or more of 
the parameters.

But, regardless, I think the whole experiment verifies your thinking that the 
request was too big!

Thanks again!! :)


Jim Beale
Lead Developer
hibu.com
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Office: 610-879-3864
Mobile: 610-220-3067




-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Friday, July 19, 2013 8:08 AM
To: solr-user@lucene.apache.org
Subject: Re: Indexing into SolrCloud

Usually EOF errors indicate that the packet you're sending are too big.

Wait, though. 50K is not buffered docs, I think it's buffered _requests_.
So you're creating a queue that's ginormous and asking 2 threads to empty it.

But that's not really the issue I suspect. How many documents are you adding
at a time when you call server.add? I.e. are you using sever.add(doc) or
server.add(doclist)? If the latter and you're adding a bunch of docs, try
lowering that number. If you're sending one doc at a time I'm on the
wrong track.

Best
Erick

On Thu, Jul 18, 2013 at 2:51 PM, Beale, Jim (US-KOP) <jim.be...@hibu.com> wrote:
> Hey folks,
>
> I've been migrating an application which indexes about 15M documents from 
> straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
> zookeeper ensemble using HAProxy for load balancing. The documents are 
> processed on a quad core machine with 6 threads and indexed into SolrCloud 
> through HAProxy using ConcurrentUpdateSolrServer in order to batch the 
> updates.  The indexing box is heavily-loaded during indexing but I don't 
> think it is so bad that it would cause issues.
>
> I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
> 1.4.22.
>
> I've been accepting the default HttpClient with 50K buffered docs and 2 
> threads, i.e.,
>
> int solrMaxBufferedDocs = 50000;
> int solrThreadCount = 2;
> solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
> solrMaxBufferedDocs, solrThreadCount);
>
> autoCommit is configured in the solrconfig as follows:
>
>      <autoCommit>
>        <maxTime>600000</maxTime>
>        <maxDocs>500000</maxDocs>
>        <openSearcher>false</openSearcher>
>      </autoCommit>
>
> I'm getting the following errors on the client and server sides respectively:
>
> Client side:
>
> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
> when processing request: Software caused connection abort: socket write error
> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
> SystemDefaultHttpClient - Retrying request
> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
> SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught 
> when processing request: Software caused connection abort: socket write error
> 2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
> SystemDefaultHttpClient - Retrying request
>
> Server side:
>
> 7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  â 
> java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] 
> early EOF
>         at 
> com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
>         at 
> com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
>         at 
> com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
>         at 
> org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)
>
> When I disabled autoCommit on the server side, I didn't see any errors there 
> but I still get the issue client-side after about 2 million documents - which 
> is about 45 minutes.
>
> Has anyone seen this issue before?  I couldn't find anything useful on the 
> usual places.
>
> I suppose I could setup wireshark to see what is happening but I'm hoping 
> that someone has a better suggestion.
>
> Thanks in advance for any help!
>
>
> Best regards,
> Jim Beale
>
> hibu.com
> 2201 Renaissance Boulevard, King of Prussia, PA, 19406
> Office: 610-879-3864
> Mobile: 610-220-3067
>
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> The information contained in this email message, including any attachments, 
> is intended solely for use by the individual or entity named above and may be 
> confidential. If the reader of this message is not the intended recipient, 
> you are hereby notified that you must not read, use, disclose, distribute or 
> copy any part of this communication. If you have received this communication 
> in error, please immediately notify me by email and destroy the original 
> message, including any attachments. Thank you.
The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.

RE: Indexing into SolrCloud

Reply via email to