: So looks like all we can do is it monitoring the logs and alarm people to
: fix the issue and rerun the scripts, etc. whenever failures occur. Is that
: the correct understanding?

I have *never* seen snappuller or snapinstaller fail (except during an
initial rollout of Solr when i forgot to setup the neccessary ssh keys).

I suppose we could at an option to snapinstaller to support explicitly
installing a snapshot by name ... then if you detect that salve Z didn't
load the latest snapshot, you could always tell the other slaves to
snapinstall whatever older version slave Z is still using -- but frankly
that seems a little silly -- not to mention that if you couldn't load the
snapshot into Z, odds are Z isn't responding to queries either.

a better course of action might just be to have an automated system which
monitors the distribution status info on the master, and takes any slaves
that don't update it properly out of your load balances rotation (and
notifies people to look into it)



-Hoss

Reply via email to