We are experiencing unexpected recovery events when a leader is sending updates to a replica. A "java.net.SocketException: Connection reset² is encountered when updating the replica which triggers the recovery.
In our previous Solr 4.6.1 installation, update errors triggered retry logic in the SolrCmdDistributor and the updates continued without triggering a leader initialized recovery. In our current 4.10.2 installation, this retry logic no longer occurs. It looks like the fix for https://issues.apache.org/jira/browse/SOLR-5509 removed this retry logic. See https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apach e/solr/update/SolrCmdDistributor.java?r1=1546672&r2=1546164&pathrev=1546672 . This change was introduced with Solr 4.7. The commit to remove the retry logic appears to have been removed when investigating an unstable test. I am wondering if the retry logic should be restored for production use. Should I open a ticket to restore the retry logic? Thanks, Lindsay On 2015-01-12, 5:36 PM, "Lindsay Martin" <lmar...@abebooks.com> wrote: >I have uncovered some additional details in the shard leader log: > >2015-01-11 09:38:00.693 [qtp268575911-3617101] INFO >org.apache.solr.update.processor.LogUpdateProcessor [listings] >webapp=/solr path=/update >params{distrib.from=http://solr05.search.abebooks.com:8983/solr/listings/& >u >pdate.distrib=TOLEADER&wt=javabin&version=2} {add=[14065572860 >(1490024273004199936)]} 0 707 >2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR >org.apache.solr.update.StreamingSolrServers error >java.net.SocketException: Connection reset > >snip