Re: Solr replication hangs on multiple slave nodes

Otis Gospodnetic Thu, 04 Oct 2012 10:54:05 -0700

Hi,

I haven't seen this error before.


Some questions/suggestions...
Have you tried with 3.6.1?
Is the disk full?
Have you tried watching the network with
http://code.google.com/p/tcpmon/ or tcpdump?

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Oct 4, 2012 at 12:06 PM, Justin Babuscio
<jbabus...@linchpinsoftware.com> wrote:
> After a large index rebuild (16-masters with ~15GB each), some slaves fail
> to completely replicate.
>
> We are running Solr v3.5 with 16 masters and 2 slaves each for a total of
> 48 servers.
>
> 4 of the 32 slaves sit in a stalled replication state with similar messages:
>
> Files Downloaded:  254/260
> Downloaded: 12.09 GB / 12.09 GB [ 100% ]
> Downloading File: _t6.fdt, Downloaded: 3.1 MB / 3.1 MB [ 100 % ]
> Time Elapsed: 3215s, EStimated Time REmaining: 0s, Speed: 24.5 MB/s
>
>
> As you'll notice, all download sizes appear to be complete but the files
> downloaded are not.  This also prevents the servers from polling for a new
> update from the masters.  When searching, we are occasionally seeing 500
> responses from the slaves that fail to replicate.  The errors are
>
> ArrayIndexOutOfBounds - this occurs when writing the HTTP Response (our
> container is WebSphere)
> NullPointerExceptions - org.apache.lucnee.queryParser.QueryParser.parse
> (QueryParser.java:203 )
>
> We have tried to stop the slave, delete the /data directory, and restart.
>  This started downloading the index but stalled as expected.
>
> Thanks,
> Justin

Re: Solr replication hangs on multiple slave nodes

Reply via email to