Hi, On Tue, Sep 15, 2015 at 05:26:38PM -0700, Eric Johnson wrote: > I just checked, and there aren't any open bugs about this. > Interrupting svnrdump can result in a dump file with not all the files of > the last commit in the dump record. Accidentally use that dump file to > load into a new repository, and the resulting repository will not be a > copy of the original. > My particular use case, I was trying to suck down a large repository. > Connection interrupted part way through. I resumed from part way through > (using the --incremental option) into an additional dump file. Then did a > load of those two dump files. Did not yield a copy of the original > repository, though. > This seems like a critical issue for possible data loss when copying > repositories from machine to machine using svnrdump.
AFAICS (not an svnrdump expert here) very well described and to the point. You just managed to pinpoint a rather important serialization format that seemingly isn't fully properly atomically transaction-safe... (good catch!) > I suspect the right solution to this is to put an "end of file" marker at > the end of a dump stream. If it isn't there, then svnadmin load will see > its absence, and must discard the last commit. However a "file"-related "end of payload" marker does not necessarily cut it, since "file" merely is a (rather unrelated) outer transport container for (a flexible number of) inner sub elements of data. Or, IOW, payload of each and every meaningful sub element within the complete payload to be transmitted best ought to (or rather: "MUST"?) be fully verifiable in itself. To make this more evident, inferring "discard this broken commit" due to a completely unrelated/foreign event "missing transmission end marker" is a lot more indirect (completely unrelated mechanisms/reasons) than inferring "discard this broken commit" due to the commit data payload full (outer) sub unit itself failing a cryptographic/checksum/length check *of this unit proper*. (oh, and what about not only the case of having to discard the last commit only, but instead detecting/discarding other commits within the stream which happen to contain breakage? talk about fully provided transaction safety...) And then there is also the question of whether it's even the serialization format itself which is to specially add markers of what constitutes a "complete" sub unit, or whether it's the "higher-layer" which is to "inherently/implicitly realize" whether those chunks of data it got do constitute a "complete" sub unit (think layering - e.g. ISO etc.). OTOH since serialization (format) *is* generated by just *that* higher-level layer "on the other side" of the parser side (probably also svnrdump, right?), *that* layer does fully define/control the entire serialization format and thus probably should insert payload sub unit boundary/validity markers (perhaps via a chunked file format or some such). But these thoughts of mine here about this topic could possibly be relegated to "ramblings" area, since after all it's a simple(?) matter of thoroughly researching current "Best Practice" of implementing transaction-safe serialization formats and then simply achieving just such a correct implementation... ;) Andreas Mohr
