Hi, there,

We want to use Solr's Collection Distribution. Here's the question regarding
recovery of failures of the scripts.  To my understanding:

* if the snapuller fails on a slave, we can possibly implement something
like the master would examine the status messages from all slaves and notify
all slaves to execute snapinstaller if all statuses are success.

* however, if then snapinstaller fails on a slave, there is really no simple
operation to rollback so that all slaves can still keep the same old index.
Besides, there is usually some hardware, network or simply Solr problems
causing the snapinstaller to fail. The problem may prevent any rollback
operation to execute, even if there is such an operation.

It seems possible to implement a 2-phase commit like protocol to provide
automatic recovery to keep all slave indexes consistent at all time.
However, one being that I don't see there's an rollback operation for
snapinstaller; two this would definitely complicates the system.

So looks like all we can do is it monitoring the logs and alarm people to
fix the issue and rerun the scripts, etc. whenever failures occur. Is that
the correct understanding?


Thanks,

-Hui

Reply via email to