Thanks, guys. Glad to know the scripts work very well in your experience. (well, indeed they are quite simple.) So that's how I imagine we should do it except that you guys added a very good point -- that the monitoring system can invoke a script to take the slave out of the load balancer. I'd like to implement this idea.
Cheers, -Hui On 8/17/07, Bill Au <[EMAIL PROTECTED]> wrote: > > If snapinstaller fails to install the lastest snapshot, then chances are > that it would be able to install any earlier snapshots as well. All it > does > is some very simple filesystem operations and then invoke the Solr server > to > do a commit. I agree with Chris that the best thing to do is to take it > out > of rotation and fix the underlying problem. > > Bill > > On 8/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > > > : So looks like all we can do is it monitoring the logs and alarm people > > to > > : fix the issue and rerun the scripts, etc. whenever failures occur. Is > > that > > : the correct understanding? > > > > I have *never* seen snappuller or snapinstaller fail (except during an > > initial rollout of Solr when i forgot to setup the neccessary ssh keys). > > > > I suppose we could at an option to snapinstaller to support explicitly > > installing a snapshot by name ... then if you detect that salve Z didn't > > load the latest snapshot, you could always tell the other slaves to > > snapinstall whatever older version slave Z is still using -- but frankly > > that seems a little silly -- not to mention that if you couldn't load > the > > snapshot into Z, odds are Z isn't responding to queries either. > > > > a better course of action might just be to have an automated system > which > > monitors the distribution status info on the master, and takes any > slaves > > that don't update it properly out of your load balances rotation (and > > notifies people to look into it) > > > > > > > > -Hoss > > > > > -- Regards, -Hui