Hi Bert, On Wed, Sep 16, 2015 at 2:33 AM, Bert Huijben <[email protected]> wrote:
> > > > -----Original Message----- > > From: Andreas Mohr [mailto:[email protected]] > > Sent: woensdag 16 september 2015 07:48 > > To: Eric Johnson <[email protected]> > > Cc: [email protected]; [email protected] > > Subject: Re: Incomplete SVN dump files > > > > Hi, > > > > On Tue, Sep 15, 2015 at 05:26:38PM -0700, Eric Johnson wrote: > > > I just checked, and there aren't any open bugs about this. > > > Interrupting svnrdump can result in a dump file with not all the > files of > > > the last commit in the dump record. Accidentally use that dump file > to > > > load into a new repository, and the resulting repository will not be > a > > > copy of the original. > > > My particular use case, I was trying to suck down a large > repository. > > > Connection interrupted part way through. I resumed from part way > through > > > (using the --incremental option) into an additional dump file. Then > did a > > > load of those two dump files. Did not yield a copy of the original > > > repository, though. > > > This seems like a critical issue for possible data loss when copying > > > repositories from machine to machine using svnrdump. > > > > AFAICS (not an svnrdump expert here) very well described and to the > point. > > You just managed to pinpoint a rather important serialization format > > that seemingly isn't fully properly atomically transaction-safe... > > (good catch!) > > In some ways a dumpfile is a stream and not a file... and when you use the > commandline tools you always obtain it from stdout. > > I could argue that you in that case should check if the operation exited > successfully or with an error. > In my specific case, I'm trying to suck GB of data from Europe to the Western US. And apparently I cannot depend on the connect being stable long enough to last for the whole download. So if the dump of the last commit is incomplete, I an error code tells me, what, exactly? That I need to manually edit the stream that I just dumped into a file? That I should discard the whole dump, and start again? > > After an error you can't trust that the final portion is ok. > Sure, but why not encode that in the dump itself! The absence of an "end-commit" trailer could be a signal to every tool that uses the dump that the commit is not complete, and the transaction could be discarded! > > > The stream was also deliberately designed in a way that you can > incrementally generate it... E.g. after each new revision or as a daily > backup operation. > > > Adding some 'this is the end' marker would break those use cases, that we > have been using since the day subversion was self-hosted. (Long before 1.0) > > Sounds like an argument for a "start commit / end commit" frame in the dump. So if you want to support this use case, adding an "end-of-stream" at the end of the stream wouldn't be sufficient. Right now, the dump file apparently just has a "start commit" indicator. So it breaks everything. > > And when loading from a stream we can't continue reading to the end to see > if there is a final marker, as at that point we aren't able to go back to > the start and start the whole process. > (I've used '$ svn dump .... | ssh .... svnadmin load ...' more than a few > times for repository migrations) > SVN claims to be transactional with commits. Surely, svnadmin load can discard the last commit from a load if it was incomplete. Actually, doing anything else is just asking for occasional data corruption. I'm filing an issue. Eric
