On Oct 4, 2018, at 02:32, Chris wrote:
> we have a repo that is in dire need of getting rid of some accidental commits
> that have added large binary blobs on old branches. I've looked at
> http://svnbook.red-bean.com/nightly/en/svn.reposadmin.maint.html#svn.reposadmin.maint.filtering
> which indicates I go about this by first doing "svnadmin dump" and then work
> with "svndumpfilter" to remove paths that contain these files.
>
> But, doing dump on this big repo (>30GB) supposedly takes more than 2 weeks
> to complete. And if it takes that long, then we can't have the repo offline
> while dumping/filtering and need to be able to "merge" the offline repo with
> the changes in the live repo when we have finished the cleaning.
>
> I figured using svnsync to get the "cleaned repo" up to date with the changes
> on the "live repo", but a note in the svnsync documentation says "The only
> commits and revision property modifications that ever occur on that mirror
> repository should be those performed by the svnsync tool". Does that also
> include this kind of cleanup operation where I remove paths that don't exist
> on HEAD? If I should't use svnsync for this, what should I do instead?
My guess is that you will not be able to "'merge' the offline repo with the
changes in the live repo" if the offline repo has any changes, including
changes you make to clean things up. While you perform your cleanup, nobody
must make any commits to the repository.
This does sound like an enormous repository, and if you can't bear to have it
offline for the duration of time required to do the cleanup, then you probably
can't do the cleanup.
However, it's possible it might work. I haven't tried it. You could try it and
see if you run into any errors. If you don't, I'd guess it's ok.