Re: svnadmin dump: deleted file order differs between original & mirrored repos

Dustin Lang Tue, 29 May 2012 17:11:53 -0700


D'oh, I didn't RTFIssues.  My apologies.
  http://subversion.tigris.org/issues/show_bug.cgi?id=4134


--dstn


On Mon, 28 May 2012, Dustin Lang wrote:

Hi,
For backup purposes I keep a mirror of my svn repo. The mirror is modifiedonly by "svnsync", which runs hourly in a cron job.
In order to validate the mirror, I run an "svnadmin dump" on the mirror andon the original, and assert that their md5sums are the same.
I am finding that in a few of the revisions in my history, in which a set offiles are deleted, the svndumps on the original & mirrored repos will listthe files in different orders, which of course makes the md5sums differenteven though the repos appear to be in the same state.
Both the original & mirror are fsfs-format, and I'm using subversion-1.7.1 onboth sides. (I just checked that subversion-1.7.5 does the same thing.) Theapr and other libs are likely different versions, though.
I tried to create a small test repo that demonstrates this behavior, but Ihaven't been able to trigger it. Argh. I've been running this backupapproach for a long time and never seen this before, but it does show up in afew revs in my repo. (The repo is available at http://astrometry.net/svn,and rev 20053 shows this behavior, FWIW)
I added some debugging output to subversion/libsvn_delta/path_driver.c :svn_delta_path_driver() and it does visit the deleted files in the sameorder, but my guess is that since the deleted files get added to a hash(subversion/libsvn_repos/dump.c : delete_entry(), pb->deleted_entries) andthen the hash gets iterated over later, in close_directory()), maybe theorder of hashkeys isn't defined, so the order they actually get written outcan vary. But I've spent a total of maybe 15 minutes working with thesubversionn/apr code so your guess is better than mine.
Suggestions on how to proceed would be appreciated. My first guess would beto sort the deleted entries in close_directory() before writing them out, oruse a list-like rather than hash-like data structure to store the deleteentries.
cheers,
dustin



Background: the dump is something like:

${SVNADMIN} dump -q --incremental -r 20000:HEAD ${MIRROR} | \
    grep -v Text-copy-source-md5 | \
    md5sum -
And I do the same on the remote side via ssh. The 20000 is there to make itrun faster; I keep archives of the svndumps up to 20k.
(This does mean that if there is a change to the original repo between thesvnsync and the svndump, the md5sums will come out different. This is alow-traffic repo so I actually like the occasional false alarm: if your smokealarm goes off when you burn toast, at least you know it still works.)

Re: svnadmin dump: deleted file order differs between original & mirrored repos

Reply via email to