I am going through a similar process myself and have some questions about 
your concerns. I'm not trying to rock the boat, just looking fo clarity on 
a few
points.
 
For perspective, I am working with around 300 individual projects
in a 70+ Gb repository containing over 300k revisions.

> If I understand correctly, you manually retrieve each version where 
> the given path/project has changed in any way to afterwards dump those 
> revisions. Why is this better/faster than using svndumpfilter with 
> specifying an include path, but without the need to post process the 
> dump files? 

 
 
I personally don't see the advantage to waiting around for svnadmin dump 

to process every unrelated revision. For one project, I am only concerned 

with about 200 revisions, spread out over 210k unrelated revisions.

 

# This example took around 8 hours:

svnadmin dump /path/to/master | svndumpfilter --drop-empty-revs \
--re-number-revs include $PROJECT > $PROJECT.dump

# However, when I run this on the same project:

for rev in `svn log -r0:HEAD file:///path/to/master/$PROJECT | egrep \

"^r[0-9]+ |" | cut -d " " -f1`; do

   svnadmin dump --incremental -r ${rev:1} /path/to/master | svndumpfilter \

                                             include $PROJECT >> 
$PROJECT.dump

done

 

… I can have a usable dump file in under 30 seconds. I realize this will 
take 

longer for larger projects, but I think it makes my point. ‘svnadmin dump’ 
is 

still creating a full dump stream for each revision before svndumpfilter 
sees 

that revision to decide to keep it or not.

 

> Are you sure your approach doesn't need other paths 
> from the repo, e.g. other source paths from copy operations for 
> projects or stuff like that? 
>
 

I absolutely agree with this checking for this. You can’t successfully pull 
out 

a single path using svnadmin dump / svndumpfilter if there are copies from 
a 

location outside of whatever you are filtering for.

 

I did notice that using svnrdump pointing to url/project seems to get 

around the outside-copy-sources issue, but I think that’s another 

discussion altogether.

 

> > svnadmin dump $repo --quiet -r $rev --incremental >> $project.$rev.bak 
>
> Adding to revision files with >> should be impossible in your 
> approach. 

 
 
Are you saying that appending to an existing dump file in general is a 

problem or just with all of his node-path processing? I have had no 

trouble appending to existing dump files.

 

Thanks,

Bryon Winger

Reply via email to