: The ReplicationHandler still works when you use SolrCloud, right? can't you
: just replicate from one (or N, depending on the number of shards) of the
: nodes in the cluster? That way you could keep a Solr instance that's only
: used to replicate the indexes, and you could have it somewhere else (other

if you only replicated from one node in the cluster, you would only get 
backups of the shards that exist on that cluster -- not any shards that 
only exist on other machines.

I think that's what Tommaso was suggesting: a tool/client that could ask 
ZK about the cluster state, and then use that to generate a list of 
collection => shards+nodes so that it could ensure it SnapPulled from some 
node a copy of every shard for every collection.

Of course: if your collections are big enough that you are sharding, 
trying ot have a single backup server probably wouldn't be viable anyway, 
so a tool like that would need options to split the work up.

An alternate strategy might be to leverage the existing backup 
functionality of the ReplicatoinHandler, but add logic to make it zk/cloud 
aware, so that a single request to "backup" for a collection would 
propogate to all of the shard leaders to (delegate to a node to) backup 
that shard -- then you just need to configure the backup location for the 
ReplicationHandler to be a directory that is on your NAS.


-Hoss

Reply via email to