(apologies for the top-posting, I really need to stop using this yahoo web interface which is useless with quoting)
Thanks for all the replies. I'll try out what you outlined. There are unfortunately problems outside of my control that makes it worse and that is that for company-internal policy reasons, I'm not allowed direct access to the server, I'm only able to get a copy of the repo to work with and a promise that they can replace the repo with my modified version when I'm done. This might make some of the suggestions hard to work with, but I'll see if seems possible. Also, the server runs 1.8, and I have no authority to get it upgraded. I think I may have a chance to change the read permissions for the sync user though, so there's a ray of light somewhere in there :) W.r.t. Johan's question about the time consumption for dumping, I haven't been yet able to test it myself, I only got this as second-hand info from someone who did a dump of the repo last year, so I hope that is completely incorrect. Will try dumping as soon as I get my hands on a repo copy. Regarding why the repo is so large: my estimate from running some analysis on old revisions is that 90-95% of the data consists of beginners doing accidental commits of things that should not have been allowed to commit BR, Chris -------------------------------------------- On Thu, 10/4/18, Johan Corveleyn <jcor...@gmail.com> wrote: Subject: Re: svndumpfilter and svnsync? To: "Chris" <devnullacco...@yahoo.se> Cc: "Ryan Schmidt" <subversion-2...@ryandesign.com>, "Daniel Shahaf" <d...@daniel.shahaf.name>, "Subversion" <users@subversion.apache.org> Date: Thursday, October 4, 2018, 2:36 PM On Thu, Oct 4, 2018 at 2:33 PM Daniel Shahaf <d...@daniel.shahaf.name> wrote: > > Ryan Schmidt wrote on Thu, 04 Oct 2018 06:04 -0500: > > On Oct 4, 2018, at 02:32, Chris wrote: > > > I figured using svnsync to get the "cleaned repo" up to date with the changes on the "live repo", but a note in the svnsync documentation says "The only commits and revision property modifications that ever occur on that mirror repository should be those performed by the svnsync tool". Does that also include this kind of cleanup operation where I remove paths that don't exist on HEAD? > > Yes. The precondition for running 'svnsync' is that every revision in > the target repository is identical to the corresponding revision in the > source repository. "Correspondence", in this sense, simply means > numeric equality: r5 must correspond to r5, not to r6 nor to r4. > > > > If I should't use svnsync for this, what should I do instead? > > You should use svnsync and set the source repository's URL to a URL > that has authz restrictions denying read to the large binary blobs. > > That's it. Indeed, like Daniel said, you can do this with svnsync by setting up and authz configuration on the source repository, denying read access to the problematic files to the svnsync user (see [1]). Also, I'm quite surprised that dumping your repository takes 2 weeks. What version of svn are you using? I'm used to 'load' taking a long time (but that has been improved a lot in 1.10 by adding a --no-flush-to-disk option for 'svnadmin load' [2]), but 'dump' shouldn't take that long. Perhaps the problem is that the dump file is getting way too large. You can also consider piping svnadmin dump | svndumpfilter | svnadmin load. I would also suggest you read this FAQ entry [3], where I documented a procedure (which I've used myself) to perform a dump + load, while the source repo is still fully online. The initial dump+load can take a long time. Then you follow up with an incremental dump+load to catch up with commits that happened in the meantime (you can repeat this catch-up procedure as many times as you like, so you eventually have minimal downtime for the "final catchup"). Another useful thing for you to look at is the new --include and --exclude options for 'svnadmin dump' directly, which have been added in svn 1.10 [4]. These work in a similar way as svnsync + (denying via authz). If you go that route, you don't need to use svndumpfilter. [1] http://subversion.apache.org/faq.html#removal [2] http://subversion.apache.org/docs/release-notes/1.10.html#no-flush-to-disk [3] http://subversion.apache.org/faq.html#dumpload [4] http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude -- Johan