I am going through a similar process myself and have some questions about your concerns. I'm not trying to rock the boat, just looking fo clarity on a few points. For perspective, I am working with around 300 individual projects in a 70+ Gb repository containing over 300k revisions.
> If I understand correctly, you manually retrieve each version where > the given path/project has changed in any way to afterwards dump those > revisions. Why is this better/faster than using svndumpfilter with > specifying an include path, but without the need to post process the > dump files? I personally don't see the advantage to waiting around for svnadmin dump to process every unrelated revision. For one project, I am only concerned with about 200 revisions, spread out over 210k unrelated revisions. # This example took around 8 hours: svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \ --re-number-revs include $PROJECT > $PROJECT.dump # However, when I run this on the same project: for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT | egrep \ "^r[0-9]+ |" | cut -d " " -f1`; do svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \ include $PROJECT >> $PROJECT.dump done … I can have a usable dump file in under 30 seconds. I realize this will take longer for larger projects, but I think it makes my point. ‘svnadmin dump’ is still creating a full dump stream for each revision before svndumpfilter sees that revision to decide to keep it or not. > Are you sure your approach doesn't need other paths > from the repo, e.g. other source paths from copy operations for > projects or stuff like that? > I absolutely agree with this checking for this. You can’t successfully pull out a single path using svnadmin dump / svndumpfilter if there are copies from a location outside of whatever you are filtering for. I did notice that using svnrdump pointing to url/project seems to get around the outside-copy-sources issue, but I think that’s another discussion altogether. > > svnadmin dump $repo --quiet -r $rev --incremental >> $project.$rev.bak > > Adding to revision files with >> should be impossible in your > approach. Are you saying that appending to an existing dump file in general is a problem or just with all of his node-path processing? I have had no trouble appending to existing dump files. Thanks, Bryon Winger