I am going through a similar process myself and have some questions about
your concerns. I'm not trying to rock the boat, just looking fo clarity on
a few
points.
For perspective, I am working with around 300 individual projects
in a 70+ Gb repository containing over 300k revisions.
> If I understand correctly, you manually retrieve each version where
> the given path/project has changed in any way to afterwards dump those
> revisions. Why is this better/faster than using svndumpfilter with
> specifying an include path, but without the need to post process the
> dump files?
I personally don't see the advantage to waiting around for svnadmin dump
to process every unrelated revision. For one project, I am only concerned
with about 200 revisions, spread out over 210k unrelated revisions.
# This example took around 8 hours:
svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \
--re-number-revs include $PROJECT > $PROJECT.dump
# However, when I run this on the same project:
for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT | egrep \
"^r[0-9]+ |" | cut -d " " -f1`; do
svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \
include $PROJECT >>
$PROJECT.dump
done
… I can have a usable dump file in under 30 seconds. I realize this will
take
longer for larger projects, but I think it makes my point. ‘svnadmin dump’
is
still creating a full dump stream for each revision before svndumpfilter
sees
that revision to decide to keep it or not.
> Are you sure your approach doesn't need other paths
> from the repo, e.g. other source paths from copy operations for
> projects or stuff like that?
>
I absolutely agree with this checking for this. You can’t successfully pull
out
a single path using svnadmin dump / svndumpfilter if there are copies from
a
location outside of whatever you are filtering for.
I did notice that using svnrdump pointing to url/project seems to get
around the outside-copy-sources issue, but I think that’s another
discussion altogether.
> > svnadmin dump $repo --quiet -r $rev --incremental >> $project.$rev.bak
>
> Adding to revision files with >> should be impossible in your
> approach.
Are you saying that appending to an existing dump file in general is a
problem or just with all of his node-path processing? I have had no
trouble appending to existing dump files.
Thanks,
Bryon Winger