On Tue, Aug 2, 2011 at 2:07 AM, zhiwei chen <zhiw...@gmail.com> wrote: > hi,everyone. > We have many svn repositories,more than 100,000 , but every repository has > less than 1024M. > So,which svn backup strategies should I use ?
Great bird of space, what are you running? Sourceforge? You're approaching 100 TB of repository space!!! I have to assume that 99.9% of these are idle auto-generated repositories created as part of sme regression testing or continuous build structure. I went through something like this with an in-house backup system that used a database to manage hardlinks, and most of whose directories had no actual edits or unlocked files in them. I had to optimize it by basically ignoring all the non-active equivalent of tags, which turned it from an insane 5 day restoration procedure to a 2 hour restoration procedure. I assume that the old, stable repositories are what most of us would use as tags: suitable to lock down and backup up with rsync, star, or a similar tool that will not re-copy every byte every time you run it, that can be run twice without overwriting already transmitted files, and that can be gracefully managed to select or deselect targets. This will mirror not only the revisiions, but the file ownership, authentication and scripting internal to the repository. It won't mirror HTTP access or web configs, or SSH based access configurations, so treat that separately. That said, the databases can be synchronized with svnsync on a remote server for efficiency, and to help avoid corruption issues from mirroring files in the midst of database interactions. This will *not* gain you fail over repositories with identical uuid's suitable for "svn switch" operations, but it will also allow you to update your backup server's subversion binaries without interfering with the primary system.. Any repository that has had updates since the last svnsync, svnadmin dump, or other backup technology, however, will be prone to "split-brain" problems where a new revision submitted on the failover or recovered server does not match the revision previously with the same number on the original server, and chaos will ensue. Split-brain is something that people don't seem to worry about much for small repositories: you can notify your clients that they need to re-checkout their working copies and copy over their working files, and they'll only lose some recent commits. But it's potentially really, really nasty to automated procedures. Frankly, this is the point where you call WanDisco and say "Hi, I've got a problem: do you have a commercial grade solution?" They have tools that will do multi-master setups and avoid the "split-brain" problem, and have probably already addressed the backup needs.