Hi, I haven't seen this error before.
Some questions/suggestions... Have you tried with 3.6.1? Is the disk full? Have you tried watching the network with http://code.google.com/p/tcpmon/ or tcpdump? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 4, 2012 at 12:06 PM, Justin Babuscio <jbabus...@linchpinsoftware.com> wrote: > After a large index rebuild (16-masters with ~15GB each), some slaves fail > to completely replicate. > > We are running Solr v3.5 with 16 masters and 2 slaves each for a total of > 48 servers. > > 4 of the 32 slaves sit in a stalled replication state with similar messages: > > Files Downloaded: 254/260 > Downloaded: 12.09 GB / 12.09 GB [ 100% ] > Downloading File: _t6.fdt, Downloaded: 3.1 MB / 3.1 MB [ 100 % ] > Time Elapsed: 3215s, EStimated Time REmaining: 0s, Speed: 24.5 MB/s > > > As you'll notice, all download sizes appear to be complete but the files > downloaded are not. This also prevents the servers from polling for a new > update from the masters. When searching, we are occasionally seeing 500 > responses from the slaves that fail to replicate. The errors are > > ArrayIndexOutOfBounds - this occurs when writing the HTTP Response (our > container is WebSphere) > NullPointerExceptions - org.apache.lucnee.queryParser.QueryParser.parse > (QueryParser.java:203 ) > > We have tried to stop the slave, delete the /data directory, and restart. > This started downloading the index but stalled as expected. > > Thanks, > Justin