Greg, Is SolR your main system of record or is it a secondary index to a primary data store?
Depending on the answer to that question I would recommend different options. If primary, then I would ask what is the underlying compute infrastructure. Is it container, VM , or bare metal. There are some decent distributed shared file system services that could be leveraged depending on the number of compute nodes. Shared file system is the best way to keep it consistent but it comes with its draw backs. You can always backup locally and asynchronously sync to shared FS too. -- Rahul Singh rahul.si...@anant.us Anant Corporation On May 30, 2018, 5:16 PM -0400, Greg Roodt <gro...@gmail.com>, wrote: > Thanks for the confirmation Shawn. Distributed systems are hard, so this > makes sense. > > I have a large, stable cluster (stable in terms of leadership and > performance) with a single shard. The cluster scales up and down with > additional PULL replicas over the day with the traffic curve. > > It's going to take a bit of coordination to get all nodes to mount a shared > volume when we take a backup and then unmount when done. > > Any idea what happens if a node joins or leaves during a backup? > > > > > > > > > > On Thu, 31 May 2018 at 06:14, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 5/29/2018 3:01 PM, Greg Roodt wrote: > > > What is the best way to perform a backup of a Solr Cloud cluster? Is > > there > > > a way to backup only the leader? From my tests with the collections admin > > > BACKUP command, all nodes in the cluster need to have access to a shared > > > filesystem. Surely that isn't necessary if you are backing up the leader > > or > > > TLOG replica? > > > > If you have more than one Solr instance in your cloud, then all of those > > instances must have access to the same filesystem accessed from the same > > mount point. Together, they will write the entire collection to various > > subdirectories in that location. > > > > I can't find any mention of whether backups are load balanced across the > > cloud, or if they always use leaders. I would assume the former. If > > that's how it works, then you don't know which machine is going to do > > the backup of a given shard. Even if the backup always uses leaders, > > you can't always be sure of where a leader is. It can change from > > moment to moment, especially if you're having stability problems with > > your cloud. > > > > At restore time, there's a similar situation. You don't know which > > machine(s) in the cloud are going to be actually loading index data from > > the backup location. So they all need to have access to the same data. > > > > Thanks, > > Shawn > > > >