[ https://issues.apache.org/jira/browse/SOLR-15028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Amy Bai updated SOLR-15028: --------------------------- Description: I found that SolrCloud won't check the IO status if the SolrCloud process is alive. e.g. If I delete the data directory for one of the SolrCloud node, there are no errors report, and I can still log in to the SolrCloud Admin UI to create/query collections. SolrCloud Admin UI shows the collections' status is green. Then, index/search queries keep failing because one of the node data directories is gone, but the node is not marked as down. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas. The ERROR message as below shows: """ curl -X POST -H 'Content-Type: application/json' 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' \{ "a": "1", }' { "responseHeader": { "status":400, "QTime":6} , "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting document: ", "code":400}} """ was: I found that SolrCloud won't check the IO status if the SolrCloud process is alive. e.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud Admin UI to create/query collections. Then, index/search queries keep failing because one of the node data directories is gone, but the node is not marked as down. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas. The ERROR message as below shows: """ curl -X POST -H 'Content-Type: application/json' 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary ' \{ "a": "1", }' \{ "responseHeader":{ "status":400, "QTime":6}, "error":\{ "metadata":[ "error-class","org.apache.solr.common.SolrException", "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error inserting document: ", "code":400}} """ > summarySolrCloud shows cluster still healthy without failover even the node > data directory is deleted > ----------------------------------------------------------------------------------------------------- > > Key: SOLR-15028 > URL: https://issues.apache.org/jira/browse/SOLR-15028 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud > Affects Versions: 7.4 > Reporter: Amy Bai > Priority: Major > > I found that SolrCloud won't check the IO status if the SolrCloud process is > alive. > > e.g. If I delete the data directory for one of the SolrCloud node, there are > no errors report, and I can still log in to the SolrCloud Admin UI to > create/query collections. SolrCloud Admin UI shows the collections' status is > green. > Then, index/search queries keep failing because one of the node data > directories is gone, but the node is not marked as down. > The replicas on the failed node are not working, but the Index/search queries > didn't failover to other healthy replicas. > > The ERROR message as below shows: > """ > curl -X POST -H 'Content-Type: application/json' > 'http://localhost:18983/solr/demo.public.test/update/json/docs' --data-binary > ' \{ "a": "1", }' { "responseHeader": > { "status":400, "QTime":6} > , "error":\{ "metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","java.nio.file.NoSuchFileException"], "msg":"Error > inserting document: ", "code":400}} > """ -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org