Hi, We're testing again with today's trunk and using the new Lucene 4.1 format by default. When nodes are not restarted things are kind of stable but restarting nodes leads to a lot of mayhem. It seems we can get the cluster back up and running by clearing ZK and restarting everything (another issue) but replication becomes impossible for some nodes leading to a continuous state of failing recovery etc.
Here are some excepts from the logs: 2012-10-30 16:12:39,674 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exe c-5] - : null:java.lang.IndexOutOfBoundsException at java.nio.Buffer.checkBounds(Buffer.java:530) at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:218) at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferInde xInput.java:91) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write( ReplicationHandler.java:1065) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932) 2012-10-30 16:10:32,220 ERROR [solr.handler.ReplicationHandler] - [RecoveryThrea d] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to download _x.fdt completely. Downloaded 13631488!=13843504 at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapP uller.java:1237) at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(Sna pPuller.java:1118) at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java :716) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407) 2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] - [http-8080-exec -3] - : Exception while writing response for params: file=_p_Lucene41_0.doc&comm and=filecontent&checksum=true&generation=6&qt=/replication&wt=filestream java.io.EOFException: read past EOF: MMapIndexInput(path="/opt/solr/cores/openindex_h/data/index.20121030152234973/_p_Lucene41_0.doc") at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:100) at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1065) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932) Needless to say i'm puzzled so i'm wondering if anyone has seen this before or have some hints that might help digg further. Thanks, Markus