Hi Rahul Solr is a secondary index. The system of record is a RDBMS.
I'm currently looking at using AWS Elastic File System. Have you got any experience with this? I also thought about trying s3fs. When you say backup locally, what do you mean? Backup the files on disc without the associated Zookeeper config? Or something else? Thanks Greg On Thu, 31 May 2018 at 20:08, Rahul Singh <rahul.xavier.si...@gmail.com> wrote: > Greg, > > Is SolR your main system of record or is it a secondary index to a primary > data store? > > Depending on the answer to that question I would recommend different > options. > > If primary, then I would ask what is the underlying compute > infrastructure. Is it container, VM , or bare metal. > > There are some decent distributed shared file system services that could > be leveraged depending on the number of compute nodes. > > Shared file system is the best way to keep it consistent but it comes with > its draw backs. You can always backup locally and asynchronously sync to > shared FS too. > > -- > Rahul Singh > rahul.si...@anant.us > > Anant Corporation > On May 30, 2018, 5:16 PM -0400, Greg Roodt <gro...@gmail.com>, wrote: > > Thanks for the confirmation Shawn. Distributed systems are hard, so this > > makes sense. > > > > I have a large, stable cluster (stable in terms of leadership and > > performance) with a single shard. The cluster scales up and down with > > additional PULL replicas over the day with the traffic curve. > > > > It's going to take a bit of coordination to get all nodes to mount a > shared > > volume when we take a backup and then unmount when done. > > > > Any idea what happens if a node joins or leaves during a backup? > > > > > > > > > > > > > > > > > > > > On Thu, 31 May 2018 at 06:14, Shawn Heisey <apa...@elyograg.org> wrote: > > > > > On 5/29/2018 3:01 PM, Greg Roodt wrote: > > > > What is the best way to perform a backup of a Solr Cloud cluster? Is > > > there > > > > a way to backup only the leader? From my tests with the collections > admin > > > > BACKUP command, all nodes in the cluster need to have access to a > shared > > > > filesystem. Surely that isn't necessary if you are backing up the > leader > > > or > > > > TLOG replica? > > > > > > If you have more than one Solr instance in your cloud, then all of > those > > > instances must have access to the same filesystem accessed from the > same > > > mount point. Together, they will write the entire collection to various > > > subdirectories in that location. > > > > > > I can't find any mention of whether backups are load balanced across > the > > > cloud, or if they always use leaders. I would assume the former. If > > > that's how it works, then you don't know which machine is going to do > > > the backup of a given shard. Even if the backup always uses leaders, > > > you can't always be sure of where a leader is. It can change from > > > moment to moment, especially if you're having stability problems with > > > your cloud. > > > > > > At restore time, there's a similar situation. You don't know which > > > machine(s) in the cloud are going to be actually loading index data > from > > > the backup location. So they all need to have access to the same data. > > > > > > Thanks, > > > Shawn > > > > > > >