Could we expose some high level recovery info as part of metrics api? Then people could track number of cores recovering, recovery time, recovery phase, number of recoveries failed etc, and also build alerts on top of that.
Jan Høydahl > 6. feb. 2020 kl. 19:42 skrev Erick Erickson <erickerick...@gmail.com>: > > There’s actually a crying need for this, but there’s nothing that’s there > yet, basically you have to look at the log files and try to figure it out. > > Actually I think this would be a great thing to work on, but it’d be pretty > much all new. If you’d like, you can create a Solr Improvement Proposal here: > https://cwiki.apache.org/confluence/display/SOLR/SIP+Template to flesh out > what this would look like. > > A couple of thoughts off the top of my head: > > I really think what would be most useful would be a collections API command, > something like “RECOVERYSTATUS”, or maybe extend CLUSTERSTATUS. Currently a > replica can be stuck in recovery and never get out. There are several > scenarios that’d have to be considered: > > 1> normal startup. The replica briefly goes from down->recovering->active > which should be quite brief. > 1a> Waiting for a leader to be elected before continuing > > 2> “peer sync” where another replica is replaying documents from the tlog. > > 3> situations where the replica is replaying documents from its own tlog. > This can be very, very, very long too. > > 4> full sync where it’s copying the entire index from a leader. > > 5> knickers in a knot, it’s given up even trying to recover. > > In either case, you’d want to report “all ok” if nothing was in recovery, > “just the ones having trouble” and “everything because I want to look”. > > But like I said, there’s nothing really built into the system to accomplish > this now that I know of. > > Best, > Erick > >> On Feb 6, 2020, at 12:15 PM, dj-manning <derek.mann...@superna.net> wrote: >> >> Erick Erickson wrote >>> When you say “look”, where are you looking from? Http requests? SolrJ? The >>> admin UI? >> >> I'm open to looking form anywhere - http request, or the admin UI, or >> following a log if possible. >> >> My objective for this ask would be to human interactively follow/watch >> solr's recovery progress - if that's even possible. >> >> Stretch goal would be to autonomously report on recovery progress. >> >> The question stems from seeing recovery in log or the admin UI, then >> wondering what progress is. >> >> Appreciation. >> >> >> >> >> -- >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >