How to fix corrupt revision in repo?

2011-09-14 Thread David Hopkins
Greetings, 

I have an SVN repo that is failing svnadmin verify on revision 192. 

For some reason the verify output says: 

[...various successful revisions...] 
* Verified revision 191. 
svnadmin: Can't read file 'E:\Repositories\Client_Name\db\revs\0\16
2': End of file found 

I'm not sure why the verification command for revision 192 would throw 
an error description for revision 162. Revision 192 affected a 
completely different part of the repository to revision 162, so there is

no obvious relationship between them.

All the revisions from 193 to 332 (HEAD) are ok. It might be a one-off.

This looks like a FSFS-backed repository (I am very new to SVN and 
inherited the server from someone else!). The server is VisualSVN 2.1.4,

which is based on SVN 1.6.13. 

The clients are mostly TortoiseSVN 1.6.16, which uses SVN 1.6.17. 

What steps should I take to fix the corrupted revision? Is there more 
information that I should provide? (eg a copy of the rev 192 file?) This

problem is causing checkouts and updates to fail for files that were 
last modified in that revision. 

Regards, 

David Hopkins 
Serck Controls


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.



RE: How to fix corrupt revision in repo?

2011-09-15 Thread David Hopkins
Thanks Daniel. Responses inline. 

>> David Hopkins wrote on Thu, Sep 15, 2011 at 10:30:46 +0800: I'm not
sure 
>> why the verification command for revision 192 would throw an error 
>> description for revision 162. Revision 192 affected a completely 
>> different part of the repository to revision 162, so there is no
obvious 
>> relationship between them. 
>
> Possibly due to rep-sharing? Does db/revs/0/192 contain the number
"162" 
> in ASCII decimal delimited by whitespace? You can check that with the 
> following command: grep -a "^text:" db/revs/0/192 

Yes it does.

Here's the context in the rev file:
id: y-31.0-32.r192/673830
type: file
pred: y-31.0.r31/264323
count: 1
text: 162 670867 6111 52486 5117fb0964ca1a78dd97447d23452e73
609f4745460d6e14860daff0803ee7024c54898c 191-5r/_m
cpath: [redacted]
copyroot: 32 [redacted]

Looking at the other nearby entries, they have "text: 192 [...]" 
instead of "text: 162 [...]". Is that likely to be the problem?

> Does 'svnadmin dump -r 162 >NUL' work ? 

Yes it does. 

> To answer your question: yes, most definitely a copy of the r192
(and/or 
> r162) rev file would allow to pinpoint the problem, however you might 
> not want to share those files on a public list as they may contain 
> sensitive data (versioned file contents). 

I'll find out if I can release the broken revisions in their entirety.

The corrupted revision doesn't actually contain anything particularly 
important (almost all the modified files in it have since been replaced
by newer versions anyway). Can I fix the repository by dumping every 
revision except 192, and then reloading the good revisions into a new 
repo? Or will cause problems for the revisions after 192 since one of
the revisions no longer exists?

Regards,

David Hopkins
Serck Controls


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.



RE: How to fix corrupt revision in repo?

2011-09-15 Thread David Hopkins
David Hopkins wrote on Fri, Sep 16, 2011 at 08:30:14 +0800:
> > Here's the context in the rev file:
> > id: y-31.0-32.r192/673830
> > type: file
> > pred: y-31.0.r31/264323
> > count: 1
> > text: 162 670867 6111 52486 5117fb0964ca1a78dd97447d23452e73
> > 609f4745460d6e14860daff0803ee7024c54898c 191-5r/_m
>
> That tells you that the 6111 bytes starting at offset 670867(bytes)
into
> the r162 rev file are a representation generating a file whose
checksums
> and uniquifier are given later.  See subversion/libsvn_fs_fs/structure
> for details --- basically, it's DELTA\n or PLAIN\n up through
ENDREP\n.

That's interesting. It certainly explains the "end of file" error
message
that is getting thrown, because rev 162 is only 1,506 bytes long.
Rev 162 was a deletion of a single file from a different folder in the 
repo so I'd be surprised if it contained any file representations at
all.
Rev 192 is 683,471 bytes long, so it *is* long enough for a 670867 byte 
offset to make sense.

> > 
> > Looking at the other nearby entries, they have "text: 192 [...]" 
> > instead of "text: 162 [...]". Is that likely to be the problem?
> > 
>
> It's normal for r192 to contain "text: 162" if rep-sharing is enabled
or
> if you did a copy-without-textmods from r162.
>

Ok. I think rep-sharing is probably enabled because this server was
installed
using SVN 1.6, and we haven't altered the setting. (It's on by default,
yes?)

But, I can see from the CPATH which file in r192 is referencing r162 
(EDGE.CSV), and that reference doesn't make sense.

The history of EDGE.CSV is as follows:
R31: EDGE.CSV added to repo

R32: one of the directories in EDGE.CSV's parent path was renamed

(R162: a single file in a completely different part of the repo was
deleted.
Literally the only part of their file path in common was the repo root 
folder. EDGE.CSV and the deleted file have no shared history,
relationships,
or even data in common - one is a CSV file and the deleted file was a
binary 
archive!)

R192: EDGE.CSV was modified, along with several other files in the same
folder.
I've now checked, and every single other text: field in R192 references
R192. 
There are no other revisions referenced.

R335: EDGE.CSV was deleted. This is because that file wasn't very
important, 
and all the other files which changed in r192 were updated in later
revisions 
and apparently can be successfully checked out/updated.

> > > To answer your question: yes, most definitely a copy of the r192
> > (and/or 
> > > r162) rev file would allow to pinpoint the problem, however you
might 
> > > not want to share those files on a public list as they may contain

> > > sensitive data (versioned file contents). 
> > 
> > I'll find out if I can release the broken revisions in their
entirety.
> > 
>
> Perhaps someone would be willing to have a look at those two revision
> files privately.
>
> (In fact, I might be able to do this too.  But I'm reluctant to make
> a promise or commitment about this.)
>
> > The corrupted revision doesn't actually contain anything
particularly 
> > important (almost all the modified files in it have since been
replaced
> > by newer versions anyway). Can I fix the repository by dumping every

> > revision except 192, and then reloading the good revisions into a
new 
> > repo? Or will cause problems for the revisions after 192 since one
of
> > the revisions no longer exists?
> > 
>
> That won't work if files after r192 are stored as deltas against the
> fulltext of r192.
>

Hmm, ok.

I'm thinking about making a copy of the repository folder, and seeing
what
happens if I replace "text: 162" with "text: 192" in revs\0\192, since
the 
offsets appear to pass the "smell test" for file size. Is there _any_
chance 
that that will work? Or are there other references I would also need to
patch 
inside the revs\0\192 file?

I thought I'd try doing an svndump and then use svndumpfilter to exclude

EDGE.CSV, since it seems to be the only thing with an invalid rev
reference,
but the svnadmin dump operation fails when it gets to r192, since it
can't 
process the reference to r162 either.

Regards,

David Hopkins
Serck Controls


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.



RE: How to fix corrupt revision in repo?

2011-09-18 Thread David Hopkins
> Daniel Shahaf wrote:
> One more thing.  The fact that in r162 one file was deleted *and no
> files were added or changed* implies that the only new representations
> in r162 would be directory representations --- it wouldn't add any
> *file* representations --- so the reference to r162 in the node-rev
> header (the sequence of ASCII lines of which the "text:" line is part)
> is almost certainly bogus.
> 
> I'm curious to hear whether the problem was indeed that the noderev
> referred to r162 instead of r192.

Sadly, it wasn't. I've now experimented with that. The offset supposedly
within r162 is listed as 670867 bytes, which is well outside the total
length of r162 as we've already discussed. But it isn't a valid pointer
within r192 either; offset 670867 points to the middle of one of the
other rep blocks within the r192 file. I've had a look at the other
node-rev headers and it appears that all the rep blocks in the r192 file
are fully accounted for by the node-revs which have text: 192. (That is,
there are no representations in the r192 file which don't already have a
valid node-rev header).

I've had a look through all the revs between 162 and 192 which are at
least 600 KiB in size. But I can't find *any* rev files in the whole
repository history leading up to 192 where an offset of 670867 points to
the beginning of a DELTA or PLAIN representation.

So, I'm now assuming that both the reference to r162, and the offset of
670867 bytes, are bogus. But there aren't any obvious candidates for a
non-bogus representation of that particular file update.

Given that the file with the bogus node-rev is unimportant, and has
since been deleted from the repo, is there any way to patch the r192
rev-file so that the repository has enough internal consistency to
produce a valid dump file?

At the moment it looks like the "nuclear option" is to check out the
current version of everything and start a new repository with it. This
*should* work because the corrupted file isn't included in recent
revisions, so SVN won't need to de-reference the invalid reference in
r192 when performing the check out. But if I can purge the broken-ness
from the repo and keep the rest of the history, that would obviously be
better. I certainly don't want to keep using a repo that doesn't
validate and can't be dumped, though.

> 
> Daniel Shahaf wrote on Fri, Sep 16, 2011 at 15:37:11 +0300:
> > Quick reply, more verbose one might follow up later.
> >
> > Your reply breaks the nested quoting levels, please try to avoid it,
> are
> > you sending mail as text/plain?
> >

Sorry about breaking the nested quoting. I'm using Outlook which is
pretty mediocre as a plain-text email client. I was already using
text/plain, but Outlook's quoting style wasn't right, so I was trying to
manually fix the text-wrapping and quote marks. Clearly I wasn't getting
it right. I've now found a couple more Outlook settings which will
hopefully address the problem.

Unfortunately, it doesn't look like I'll be able to send you the actual
rev file(s), at least not without a lot of inconvenience that I don't
want to subject you to (ie an NDA, since we don't actually own the IP to
any of the code which may be included in the rev file). Sorry about
that.

Regards,

David Hopkins


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.



RE: How to fix corrupt revision in repo?

2011-09-19 Thread David Hopkins
> Daniel Shahaf wrote on Monday, 19 September 2011 9:27 PM:
> You ought to be able to keep the rest of the history even without
fixing
> the brokenness in r192.  (as the file is deleted in HEAD, a checkout
> should work; and you also have the option of dumping the history while
> excluding the problematic file from it (via authz+svnsync/svnrdump or
> svndumpfilter).)

I'll look into the authz+svnsync/svnrdump option. Svndumpfilter doesn't
work for me because the 'svnadmin dump' operation fails when it tries to
process 192 (before I get a chance to use svndumpfilter to eliminate the
bogus file). As far as I can tell svndumpfilter operates on dumpfiles
that already exist, and can't actually stop svnadmin from trying to
resolve the bogus node-rev header during the dump process. The
authz+svnsync solution will hopefully allow me to effectively do that
filtering at an earlier stage in the pipeline. 

Thank you very much for all your help,

David Hopkins
Serck Controls


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.



RE: How to fix corrupt revision in repo?

2011-09-19 Thread David Hopkins
> > Daniel Shahaf wrote on Monday, 19 September 2011 9:27 PM:
> > You ought to be able to keep the rest of the history even without
> fixing
> > the brokenness in r192.  (as the file is deleted in HEAD, a checkout
> > should work; and you also have the option of dumping the history
while
> > excluding the problematic file from it (via authz+svnsync/svnrdump
or
> > svndumpfilter).)
> 
> I'll look into the authz+svnsync/svnrdump option. Svndumpfilter
doesn't
> work for me because the 'svnadmin dump' operation fails when it tries
to
> process 192 (before I get a chance to use svndumpfilter to eliminate
the
> bogus file). As far as I can tell svndumpfilter operates on dumpfiles
> that already exist, and can't actually stop svnadmin from trying to
> resolve the bogus node-rev header during the dump process. The
> authz+svnsync solution will hopefully allow me to effectively do that
> filtering at an earlier stage in the pipeline.

For the benefit of anyone else who comes across this message thread in
the future, I thought I'd post a final follow-up message with my
results.

The authz+svnrdump solution *did* work for creating a dumpfile without
references to the corrupted file revision. I ended up setting up a
temporary server where I could set custom authz permissions, and
downloaded a beta SVN 1.7 client so that I could use svnrdump rather
than svnsync (which was much simpler to set up). I've successfully
loaded the purged dumpfile into a new repository which now works with
svnadmin verify, svnadmin dump, svnadmin hotcopy etc.

Thanks once again for all your help (especially the authz+svnrdump
suggestion).

Regards,

David Hopkins
Serck Controls


= PRIVACY AND CONFIDENTIALITY NOTICE =
The information contained in this message is intended for the named recipient 
only.  It may contain privileged and confidential information.  If you are not 
the intended recipient, you must not copy, distribute, take any action in 
reliance on it, or disclose any details of the message to any person, firm or 
corporation. If you have received this message in error, please notify the 
sender immediately by reply e-mail and delete all copies of this transmission 
together with any attachments.
The views or opinions expressed in this e-mail or any attachment are not 
necessarily those of Serck Controls Pty Ltd.
NOTE - You should carry out your own virus checks before opening any attachment.