We are using Solr 7.2. We have two solrclouds that are hosted on Google clouds. 
These are targets for an on Prem solr cloud where we run our ETL loads  and 
have CDCR replicate it to the Google clouds. This mostly works pretty well. 
However, networks can fail. When the network has a brief outage we frequently 
then see corrupted tlog files. Frequently we see 0 length tlog files or files 
that appear to be truncated. When this happens we see lots of cdcr errors. If 
there is a corrupt tlog, we delete it and things go back to normal.
The frequency of the errors is troubling. CDCR needs to be more robust with 
networking issues. I don't know how tlogs get corrupted in this scenario, but 
they obviously do.

Today we started seeing lots of CdcrReplicator errors but could not find a 
corrupt tlog. This is a trace from the logs
java.io.EOFException
                at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
                at 
org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:863)
                at 
org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
                at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
                at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
                at 
org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:603)
                at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:315)
                at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
                at 
org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
                at 
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
                at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
                at 
org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:690)
                at 
org.apache.solr.update.CdcrTransactionLog$CdcrLogReader.next(CdcrTransactionLog.java:304)
                at 
org.apache.solr.update.CdcrUpdateLog$CdcrLogReader.next(CdcrUpdateLog.java:633)
                at 
org.apache.solr.handler.CdcrReplicator.run(CdcrReplicator.java:77)
                at 
org.apache.solr.handler.CdcrReplicatorScheduler.lambda$null$0(CdcrReplicatorScheduler.java:81)
                at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
                at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
                at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
                at java.lang.Thread.run(Thread.java:748)

Our admins restarted the source solr servers and that seems to have helped.

Reply via email to