Active-Active Clustering with Subversion
Hi All, I'm starting a new project to consolidate all svn repos across our company into a single instance. Originally we looked at doing a active-passive cluster, but after looking at the loads on the current individual svn repos, we are thinking that an active-active cluster would be preferable. My question is, is it possible/safe to have two apache/svn nodes accessing the same repo on the same storage system, shared out via nfs v3? Of course the repo DB will formated with type FSFS, but we are concerned about data corruption with multiple nodes doing commits to the same repo. Does anyone have any experience using svn in this or a similar configuration? Thanks B
Re: Active-Active Clustering with Subversion
Thanks for the reply Ryan, I'll have to look further into how locking is setup on our NetApp FAS 3070. We were also considering using GFS to handle the locking, have you heard anything about users having multiple svn compute nodes connecting to a repo on GFS and using distributed lock manager? I saw that some people were using svnsync and writethrough proxying. I was concerned about the read-only copies keeping up during nightly builds when our developers will often go through thousands of commits. I'll have to look into the documentation for svnsync a little closer. B On Fri, May 7, 2010 at 11:50 AM, Ryan Schmidt < subversion-20...@ryandesign.com> wrote: > > On May 7, 2010, at 10:26, BD wrote: > > > I'm starting a new project to consolidate all svn repos across our > company into a single instance. Originally we looked at doing a > active-passive cluster, but after looking at the loads on the current > individual svn repos, we are thinking that an active-active cluster would be > preferable. > > > > My question is, is it possible/safe to have two apache/svn nodes > accessing the same repo on the same storage system, shared out via nfs v3? > Of course the repo DB will formated with type FSFS, but we are concerned > about data corruption with multiple nodes doing commits to the same repo. > Does anyone have any experience using svn in this or a similar > configuration? > > Hosting a repo on NFS can work, but so many people write here for help > after trying to do so and finding it doesn't work for them. It depends on > whether your NFS implementation supports proper locking. > > I've been told before that to do active-active clustering, you would want > to have the repository data located on a cluster filesystem (e.g. Apple > Xsan) accessed by both servers. Otherwise data corruption would indeed be a > concern. > > But, these days, you could have a simpler setup with two (or more) > standalone servers which mirror each other's contents using svnsync. Write > requests would have to happen on a single master server only, but the > mirrors could be configured with a writethrough proxy to make this > transparent. You should be able to find documentation on setting these up. > >
Re: Active-Active Clustering with Subversion
Thanks Les, I know NFS itself can certainly be a bottleneck. However, we will be devoting at least three shelves of disk on our NetApp 3070 which in our standard RAID group size will make for about 38 data spindles and we will have have 256 GB of read cache per head on a two head storage system. Initially we dont expect compute capacity to be a problem with our SVN setup, but we are a growing company and are planning this SVN cluster to be scalable with the organization. So the question remains, taking physical restraints out of the question, is there anyone out there who knows about managing the risks assocciated with having two or more apache/svn nodes accessing repos that are stored on a shared NFS storage system, with the SVN DBs using FSFS. B On Fri, May 7, 2010 at 1:05 PM, Les Mikesell wrote: > On 5/7/2010 10:26 AM, BD wrote: > >> Hi All, >> >> I'm starting a new project to consolidate all svn repos across our >> company into a single instance. Originally we looked at doing a >> active-passive cluster, but after looking at the loads on the current >> individual svn repos, we are thinking that an active-active cluster >> would be preferable. >> >> My question is, is it possible/safe to have two apache/svn nodes >> accessing the same repo on the same storage system, shared out via nfs >> v3? Of course the repo DB will formated with type FSFS, but we are >> concerned about data corruption with multiple nodes doing commits to the >> same repo. Does anyone have any experience using svn in this or a >> similar configuration? >> > > The underlying disk system itself is probably the bottleneck so > distributing access isn't likely to help performance that much anyway. I'd > expect bigger gains from beefing up the storage unit (make the raid > distribute over more drives, don't share those drives with other work, use a > controller with battery-backed buffering, etc.). > > -- > Les Mikesell > lesmikes...@gmail.com >
Re: Active-Active Clustering with Subversion
Thanks Hyrum, Thats a very interesting way of looking at this problem. It does make sense that multiple commit processes coming from the same machine really wouldnt be that different from the question I was asking. I guess from here I'll have to do some testing somehow, with nfs in the mix and see if I can purposly corrupt data by running many commit requests from two separate apache nodes. But if what your saying is right, it sounds like i shouldnt have much in the way of problems. Thanks again! B On Mon, May 10, 2010 at 5:03 AM, Hyrum K. Wright < hyrum_wri...@mail.utexas.edu> wrote: > > > On Fri, May 7, 2010 at 8:08 PM, BD wrote: > >> >> So the question remains, taking physical restraints out of the question, >> is there anyone out there who knows about managing the risks assocciated >> with having two or more apache/svn nodes accessing repos that are stored on >> a shared NFS storage system, with the SVN DBs using FSFS. > > > I can't comment on your specific situation, but Subversion repositories are > designed to be accessed by multiple concurrent processes, even if these > processes are located on separate hosts. When using a single instances > of Apache, for example, multiple requests can often spawn multiple processes > which all interact (correctly) with the Subversion repository. In addition, > the write-serialization window is relatively small, and writers do not block > readers, so even during long-running parallel commits, read operations will > still work as expected. > > Throwing NFS in the mix here may complicate things a bit, but probably not > by much. > > -Hyrum >