Hi Nikolas, The restart of node is not helping , the node keeps trying to recover and always fails:
here is the log : 2019-07-31 06:10:08.049 INFO (coreZkRegister-1-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.ZkController Core needs to recover:parts_shard30_replica_n2697 2019-07-31 06:10:08.050 INFO (updateExecutor-3-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.u.DefaultSolrCoreState Running recovery 2019-07-31 06:10:08.056 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy Starting recovery process. recoveringAfterStartup=true 2019-07-31 06:10:08.261 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy startupVersions size=49956 range=[1640550593276674048 to 1640542396328443904] 2019-07-31 06:10:08.328 INFO (qtp689401025-58) o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/key params={omitHeader=true&wt=json} status=0 QTime=0 2019-07-31 06:10:09.276 INFO (recoveryExecutor-4-thread-1-processing-n:replica_host:8983_solr x:parts_shard30_replica_n2697 c:parts s:shard30 r:core_node2698) x:parts_shard30_replica_n2697 o.a.s.c.RecoveryStrategy Failed to connect leader http://hostname:8983/solr on recovery, try again The ping request query is being called from solr itself and not via some script,so there is no way to stop it . code where the time is hardcoded to 1 sec: try (HttpSolrClient httpSolrClient = new HttpSolrClient.Builder(leaderReplica.getCoreUrl()) .withSocketTimeout(1000) .withConnectionTimeout(1000) .withHttpClient(cc.getUpdateShardHandler().getRecoveryOnlyHttpClient()) .build()) { SolrPingResponse resp = httpSolrClient.ping(); return leaderReplica; } catch (IOException e) { log.info("Failed to connect leader {} on recovery, try again", leaderReplica.getBaseUrl()); Thread.sleep(500); } catch (Exception e) { if (e.getCause() instanceof IOException) { log.info("Failed to connect leader {} on recovery, try again", leaderReplica.getBaseUrl()); Thread.sleep(500); } else { return leaderReplica; } } On Mon, Aug 5, 2019 at 1:19 PM Nicolas Franck <nicolas.fra...@ugent.be> wrote: > If the ping request handler is taking too long, > and the server is not recovering automatically, > there is not much you can do automatically on that server. > You have to intervene manually, and restart Solr on that node. > > First of all: the ping is just an internal check. If it takes too long > to respond, the requester (i.e. the script calling it), should stop > the request, and mark that node as problematic. If there are > for example memory problems every subsequent request will only enhance > the problem, and Solr cannot recover from that. > > > On 5 Aug 2019, at 06:15, dinesh naik <dineshkumarn...@gmail.com> wrote: > > > > Thanks john,Erick and Furknan. > > > > I have already defined the ping request handler in solrconfig.xml as > below: > > <requestHandler name="/admin/ping" class="solr.PingRequestHandler"> <lst > > name="invariants"> <str name="qt">/select</str><!-- handler to delegate > to > > --> <str name="q">_root_:abc</str> </lst> </requestHandler> > > > > My question is regarding the custom query being used. Here i am querying > > for field _root_ which is available in all of my cluster and defined as a > > string field. The result for _root_:abc might not get me any match as > > well(i am ok with not finding any matches, the query should not be taking > > 10-15 seconds for getting the response). > > > > If the response comes within 1 second , then the core recovery issue is > > solved, hence need your suggestion if using _root_ field in custom query > is > > fine? > > > > > > On Mon, Aug 5, 2019 at 2:49 AM Furkan KAMACI <furkankam...@gmail.com> > wrote: > > > >> Hi, > >> > >> You can change invariants i.e. *qt* and *q* of a *PingRequestHandler*: > >> > >> <requestHandler name="/admin/ping" class="solr.PingRequestHandler"> > >> <lst name="invariants"> > >> <str name="qt">/search</str><!-- handler to delegate to --> > >> <str name="q">some test query</str> > >> </lst> > >> </requestHandler> > >> > >> Check documentation fore more info: > >> > >> > https://lucene.apache.org/solr/7_6_0//solr-core/org/apache/solr/handler/PingRequestHandler.html > >> > >> Kind Regards, > >> Furkan KAMACI > >> > >> On Sat, Aug 3, 2019 at 4:17 PM Erick Erickson <erickerick...@gmail.com> > >> wrote: > >> > >>> You can also (I think) explicitly define the ping request handler in > >>> solrconfig.xml to do something else. > >>> > >>>> On Aug 2, 2019, at 9:50 AM, Jörn Franke <jornfra...@gmail.com> wrote: > >>>> > >>>> Not sure if this is possible, but why not create a query handler in > >> Solr > >>> with any custom query and you use that as ping replacement ? > >>>> > >>>>> Am 02.08.2019 um 15:48 schrieb dinesh naik < > dineshkumarn...@gmail.com > >>> : > >>>>> > >>>>> Hi all, > >>>>> I have few clusters with huge data set and whenever a node goes down > >> its > >>>>> not able to recover due to below reasons: > >>>>> > >>>>> 1. ping request handler is taking more than 10-15 seconds to respond. > >>> The > >>>>> ping requesthandler however, expects it will return in less than 1 > >>> second > >>>>> and fails a requestrecovery if it is not responded to in this time. > >>>>> Therefore recoveries never would start. > >>>>> > >>>>> 2. soft commit is very low ie. 5 sec. This is a business requirement > >> so > >>>>> not much can be done here. > >>>>> > >>>>> As the standard/default admin/ping request handler is using *:* > >> queries > >>> , > >>>>> the response time is much higher, and i am looking for an option to > >>> change > >>>>> the same so that the ping handler returns the results within few > >>>>> miliseconds. > >>>>> > >>>>> here is an example for standard query time: > >>>>> > >>>>> ----snip--- > >>>>> curl " > >>>>> > >>> > >> > http://hostname:8983/solr/parts/select?indent=on&q=*:*&rows=0&wt=json&distrib=false&debug=timing > >>>>> " > >>>>> { > >>>>> "responseHeader":{ > >>>>> "zkConnected":true, > >>>>> "status":0, > >>>>> "QTime":16620, > >>>>> "params":{ > >>>>> "q":"*:*", > >>>>> "distrib":"false", > >>>>> "debug":"timing", > >>>>> "indent":"on", > >>>>> "rows":"0", > >>>>> "wt":"json"}}, > >>>>> "response":{"numFound":1329638799,"start":0,"docs":[] > >>>>> }, > >>>>> "debug":{ > >>>>> "timing":{ > >>>>> "time":16620.0, > >>>>> "prepare":{ > >>>>> "time":0.0, > >>>>> "query":{ > >>>>> "time":0.0}, > >>>>> "facet":{ > >>>>> "time":0.0}, > >>>>> "facet_module":{ > >>>>> "time":0.0}, > >>>>> "mlt":{ > >>>>> "time":0.0}, > >>>>> "highlight":{ > >>>>> "time":0.0}, > >>>>> "stats":{ > >>>>> "time":0.0}, > >>>>> "expand":{ > >>>>> "time":0.0}, > >>>>> "terms":{ > >>>>> "time":0.0}, > >>>>> "block-expensive-queries":{ > >>>>> "time":0.0}, > >>>>> "slow-query-logger":{ > >>>>> "time":0.0}, > >>>>> "debug":{ > >>>>> "time":0.0}}, > >>>>> "process":{ > >>>>> "time":16619.0, > >>>>> "query":{ > >>>>> "time":16619.0}, > >>>>> "facet":{ > >>>>> "time":0.0}, > >>>>> "facet_module":{ > >>>>> "time":0.0}, > >>>>> "mlt":{ > >>>>> "time":0.0}, > >>>>> "highlight":{ > >>>>> "time":0.0}, > >>>>> "stats":{ > >>>>> "time":0.0}, > >>>>> "expand":{ > >>>>> "time":0.0}, > >>>>> "terms":{ > >>>>> "time":0.0}, > >>>>> "block-expensive-queries":{ > >>>>> "time":0.0}, > >>>>> "slow-query-logger":{ > >>>>> "time":0.0}, > >>>>> "debug":{ > >>>>> "time":0.0}}}}} > >>>>> > >>>>> > >>>>> ----snap---- > >>>>> > >>>>> can we use query: _root_:abc in the ping request handler ? Tried this > >>> query > >>>>> and its returning the results within few miliseconds and also the > >> nodes > >>> are > >>>>> able to recover without any issue. > >>>>> > >>>>> we want to use _root_ field for querying as this field is available > in > >>> all > >>>>> our clusters with below definition: > >>>>> <field name="_root_" type="string" omitNorms="true" indexed="true" > >>>>> termOffsets="false" stored="false" termPayloads="false" > termPositions= > >>>>> "false" docValues="false" termVectors="false"/> > >>>>> Could you please let me know if using _root_ for querying in > >>>>> pingRequestHandler will cause any problem? > >>>>> > >>>>> <requestHandler name="/admin/ping" class="solr.PingRequestHandler"> > >> <lst > >>>>> name="invariants"> <str name="qt">/select</str><!-- handler to > >> delegate > >>> to > >>>>> --> <str name="q">_root_:abc</str> </lst> </requestHandler> > >>>>> > >>>>> > >>>>> -- > >>>>> Best Regards, > >>>>> Dinesh Naik > >>> > >>> > >> > > > > > > -- > > Best Regards, > > Dinesh Naik > > -- Best Regards, Dinesh Naik