Ah, we're also seeing Solr lookup an unexisting directory:
2012-10-30 16:32:26,578 ERROR [handler.admin.CoreAdminHandler] -
[http-8080-exec-2] - : IO error while trying to get the size of the
Directory:org.apache.lucene.store.NoSuchDirectoryException: directory
'/opt/solr/cores/shard_a/data/index' does not exist
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:220)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:243)
at
org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
at
org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
Instead of data/index it should be looking for data/index.20121030152324761/,
which actually does exist.
-----Original message-----
> From:Markus Jelsma <[email protected]>
> Sent: Tue 30-Oct-2012 17:30
> To: [email protected]
> Subject: trunk is unable to replicate between nodes ( Unable to download ...
> completely)
>
> Hi,
>
> We're testing again with today's trunk and using the new Lucene 4.1 format by
> default. When nodes are not restarted things are kind of stable but
> restarting nodes leads to a lot of mayhem. It seems we can get the cluster
> back up and running by clearing ZK and restarting everything (another issue)
> but replication becomes impossible for some nodes leading to a continuous
> state of failing recovery etc.
>
> Here are some excepts from the logs:
>
> 2012-10-30 16:12:39,674 ERROR [solr.servlet.SolrDispatchFilter] -
> [http-8080-exe
> c-5] - : null:java.lang.IndexOutOfBoundsException
> at java.nio.Buffer.checkBounds(Buffer.java:530)
> at java.nio.DirectByteBuffer.get(DirectByteBuffer.java:218)
> at
> org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferInde
> xInput.java:91)
> at
> org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(
> ReplicationHandler.java:1065)
> at
> org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)
>
>
> 2012-10-30 16:10:32,220 ERROR [solr.handler.ReplicationHandler] -
> [RecoveryThrea
> d] - : SnapPull failed :org.apache.solr.common.SolrException: Unable to
> download
> _x.fdt completely. Downloaded 13631488!=13843504
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapP
> uller.java:1237)
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(Sna
> pPuller.java:1118)
> at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java
> :716)
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:387)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:273)
> at
> org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:152)
> at
> org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
>
> 2012-10-30 16:12:51,061 WARN [solr.handler.ReplicationHandler] -
> [http-8080-exec
> -3] - : Exception while writing response for params:
> file=_p_Lucene41_0.doc&comm
> and=filecontent&checksum=true&generation=6&qt=/replication&wt=filestream
> java.io.EOFException: read past EOF:
> MMapIndexInput(path="/opt/solr/cores/openindex_h/data/index.20121030152234973/_p_Lucene41_0.doc")
> at
> org.apache.lucene.store.ByteBufferIndexInput.readBytes(ByteBufferIndexInput.java:100)
> at
> org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1065)
> at
> org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:932)
>
>
> Needless to say i'm puzzled so i'm wondering if anyone has seen this before
> or have some hints that might help digg further.
>
> Thanks,
> Markus
>