Hi,

We're testing again with today's trunk and using the new Lucene 4.1 format by 
default. When nodes are not restarted things are kind of stable but restarting 
nodes leads to a lot of mayhem. It seems we can get the cluster back up and 
running by clearing ZK and restarting everything (another issue) but 
replication becomes impossible for some nodes leading to a continuous state of 
failing recovery etc.

Here are some excepts from the logs:

2012-10-30 16:12:39,674 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exe
c-5] - : null:java.lang.IndexOutOfBoundsException
        at java.nio.Buffer.checkBounds(Buffer.java:530)
        at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:218)
        at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferInde
xInput.java:91)
        at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(
ReplicationHandler.java:1065)
        at 
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)


2012-10-30 16:10:32,220 ERROR [solr.handler.ReplicationHandler] - [RecoveryThrea
d] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to download
 _x.fdt completely. Downloaded 13631488!=13843504
        at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapP
uller.java:1237)
        at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(Sna
pPuller.java:1118)
        at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java
:716)
        at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
        at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
        at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
        at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)

2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] - [http-8080-exec
-3] - : Exception while writing response for params: file=_p_Lucene41_0.doc&comm
and=filecontent&checksum=true&generation=6&qt=/replication&wt=filestream
java.io.EOFException: read past EOF: 
MMapIndexInput(path="/opt/solr/cores/openindex_h/data/index.20121030152234973/_p_Lucene41_0.doc")
        at 
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:100)
        at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1065)
        at 
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)


Needless to say i'm puzzled so i'm wondering if anyone has seen this before or 
have some hints that might help digg further.

Thanks,
Markus

Reply via email to