Re: LBSolrClient and "zombie" check at core level vs node level

Chris Hostetter Thu, 16 Nov 2023 10:55:24 -0800


I think it's worth rememberinbg that LBSolrClient, and it's design, 
pre-dates SolrCloud and all of the ZK plumbing we have to know when nodes 
& replicas are "live" ... it was written at a time when people had to 
manually specify the list of solr servers and cores themselve when sending 
requests.


Then when SolrCloud was added, the "zk aware" CloudSolrClient logic was 
wrapped ARROUND LBSolrClient -- CloudSolrClient already has some idea what 
nodes & replicas are "live" when it sends the request, but LBSolrClient 
doesn't so...

: out when there's a wide problem.  I think that LBSolrClient ought to know
: about the nodes and should try a node level healthceck ping before
: executing any core level requests.  Maybe if the healthcheck failed then
: succeeded, and if all of a small sample of zombie cores there pass, assume
: they will all pass (don't send pings to all).  Just a rough idea.

...i think it's worth considering an inverse idea: make it configurable 
(and probably change the default given the common usecase is SolrCloud) to 
build a LBSolrClient that does *NO* zombie tracking at all -- it just 
continues to use the multiple URL options it's given for each request to 
retry on (certain types of failures).

Leave the "live" node/replica tracking to the CloudSolrClient layer, and 
if there are code paths where it's possible CloudSolrClient is pasing 
stale lists of replica URLs to LBSolrClient that it (should) already know 
are not alive (via zk watchers), let's treat those as bugs in 
CloudSolrClient and fix them.



-Hoss
http://www.lucidworks.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@solr.apache.org
For additional commands, e-mail: dev-h...@solr.apache.org

Re: LBSolrClient and "zombie" check at core level vs node level

Reply via email to