[ https://issues.apache.org/jira/browse/SOLR-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070744#comment-17070744 ]
Lucene/Solr QA commented on SOLR-14356: --------------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || || || || || {color:brown} Prechecks {color} || | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 36s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green} 2m 48s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 71m 31s{color} | {color:red} core in the patch failed. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 78m 32s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | solr.cloud.ShardRoutingTest | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | SOLR-14356 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12998164/SOLR-14356.patch | | Optional Tests | compile javac unit ratsources checkforbiddenapis validatesourcepatterns | | uname | Linux lucene2-us-west.apache.org 4.4.0-170-generic #199-Ubuntu SMP Thu Nov 14 01:45:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | ant | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh | | git revision | master / 5c2011a | | ant | version: Apache Ant(TM) version 1.9.6 compiled on July 20 2018 | | Default Java | LTS | | unit | https://builds.apache.org/job/PreCommit-SOLR-Build/729/artifact/out/patch-unit-solr_core.txt | | Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/729/testReport/ | | modules | C: solr/core U: solr/core | | Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/729/console | | Powered by | Apache Yetus 0.7.0 http://yetus.apache.org | This message was automatically generated. > PeerSync with hanging nodes > --------------------------- > > Key: SOLR-14356 > URL: https://issues.apache.org/jira/browse/SOLR-14356 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Cao Manh Dat > Priority: Major > Attachments: SOLR-14356.patch, SOLR-14356.patch > > > Right now in {{PeerSync}} (during leader election), in case of exception on > requesting versions to a node, we will skip that node if exception is one the > following type > * ConnectTimeoutException > * NoHttpResponseException > * SocketException > Sometime the other node basically hang but still accept connection. In that > case SocketTimeoutException is thrown and we consider the {{PeerSync}} > process as failed and the whole shard just basically leaderless forever (as > long as the hang node still there). > We can't just blindly adding {{SocketTimeoutException}} to above list, since > [~shalin] mentioned that sometimes timeout can happen because of genuine > reasons too e.g. temporary GC pause. > I think the general idea here is we obey {{leaderVoteWait}} restriction and > retry doing sync with others in case of connection/timeout exception happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org