Hi,
We're testing again with today's trunk and using the new Lucene 4.1 format by
default. When nodes are not restarted things are kind of stable but restarting
nodes leads to a lot of mayhem. It seems we can get the cluster back up and
running by clearing ZK and restarting everything (another issue) but
replication becomes impossible for some nodes leading to a continuous state of
failing recovery etc.
Here are some excepts from the logs:
2012-10-30 16:12:39,674 ERROR [solr.servlet.SolrDispatchFilter] - [http-8080-exe
c-5] - : null:java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkBounds(Buffer.java:530)
at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:218)
at org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferInde
xInput.java:91)
at org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(
ReplicationHandler.java:1065)
at
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)
2012-10-30 16:10:32,220 ERROR [solr.handler.ReplicationHandler] - [RecoveryThrea
d] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to download
_x.fdt completely. Downloaded 13631488!=13843504
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapP
uller.java:1237)
at org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(Sna
pPuller.java:1118)
at org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java
:716)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] - [http-8080-exec
-3] - : Exception while writing response for params: file=_p_Lucene41_0.doc&comm
and=filecontent&checksum=true&generation=6&qt=/replication&wt=filestream
java.io.EOFException: read past EOF:
MMapIndexInput(path="/opt/solr/cores/openindex_h/data/index.20121030152234973/_p_Lucene41_0.doc")
at
org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:100)
at
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1065)
at
org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)
Needless to say i'm puzzled so i'm wondering if anyone has seen this before or
have some hints that might help digg further.
Thanks,
Markus