On 2012/11/19 10:07 PM, Stefan Sperling wrote:
On Mon, Nov 19, 2012 at 04:11:01PM +0200, Gunther Mayer wrote:
So, I'm wondering if anyone from the community can help me. I think
I still have all of the original files which got written or amended
during the three broken revisions (in one or more working copies),
but one of these revisions is about 1.5GB so sharing is a bit tricky
(the other two are 23MB and 1.7MB). I'm even willing to pay somebody
to do the job if that's what's necessary, I only want to recreate my
repository from scratch as a LAST resort as I would lose all of my
history.
I've dealt with similar corruption problems in the past, where original
fulltext file content was still available.

So maybe this hint will help you: You might be able to create good
representations by committing the fulltext files to a fresh temporary
repository, possibly in multiple commits in the right order if you have
more than one version of a file available.

Extract these reps from the FSFS data of the temporary repository and
stitch them into the broken repository at appropriate places, recalculating
checksums where necessary, and tweaking offsets and maybe adding some padding
if necessary. In case the good reps use less space than the bad ones, or
the exact same amount, they can be made to work fairly easily.
If they end up being larger things gets a bit more tricky. Note that
due to the way FSFS revisions are parsed by Subversion (it looks at the
end of the file for the changed-path data section offset first) you can
move the changed-path data section further down to create more space in
an existing revision file -- but you cannot move any other existing sections
by even a single bit!

I've managed to fix several corrupt revisions like this. There was a similar
problem at the time, an elego customer's the SVN server was running in a VM
and when the host computer unexpectedly lost power revision data in several
FSFS files didn't get saved to the physical disks on time... oops!
They were able to get some fulltext files from working copies which we could
use to recreate some of the lost reps.

Some related reading material (read in given order):
https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_base/notes/fs-history
https://svn.apache.org/repos/asf/subversion/trunk/subversion/libsvn_fs_fs/structure

Good luck!


Thanks again for the advice. I ended up fixing each of the three revisions by repeatedly running fsfsverify.py until it gets stuck, then simply truncating the affected node, and repeat. Eventually it was clean, I then re-added all truncated files again from a working copy backup as a brand new revision. I was lucky - every single node I truncated was a "leaf" or "terminal" node, i.e. it never again was modified thereafter, so I was able to do this without repercussions in the rest of the respository history.

I automated the entire process which worked like a charm even on my 1.5GB revision, I'm pasting the code below in the hope that one day it will help somebody with a similar predicament (run it with the revision in question as the only argument, after making a backup copy of it of course):

#!/bin/bash

rev=$1
#dir=/backup/svn/main/db/revs/*/
f=progress_r${rev}_repair
i=0
same_count=1
same_max=10 # maximum number of attempts during which we tolerate encountering the SAME error
while true; do
    let i++
    echo Fix attempt $i >> $f
if ! ./fsfsverify.py $rev > last; then # something's wrong, try to fix it
        tail -n3 last >> $f
        cur_md5=$(tail last | md5sum)
        if [ "$cur_md5" = "$old_md5" ]; then
            let same_count++
if [ $same_count -ge $same_max ]; then # encountered the SAME error too many times, give up and truncate the offending node noderevid=$(grep NodeRev last | tail -n1 | egrep -o '[-0-9.a-z]+/[0-9]+')
                cpath=$(grep cpath last | tail -n1 | egrep -o '/.*$')
echo -e "Encountered same error $same_max times, giving up and truncating the following node:\n$cpath" >> $f
                ./fsfsverify.py -t $noderevid $rev >> $f 2>&1
                same_count=1
                continue # skip the fix attempt below
            fi
        else # reset
            same_count=1
        fi
        ./fsfsverify.py -f $rev >/dev/null
        old_md5="$cur_md5"
    else # no errors from fsfsverify.py, we're done
        break
    fi
done
echo done >> $f

Reply via email to