Re: svndumpfilter and svnsync?
Hi again, I managed to get some better permissions so I don't have to do svnsync and can get by with doing incremental dumps/loads, but I'm a bit confused by the svndumpfilter + load process so any help would be appreciated. First of all, my statement about the dump taking 2 weeks was a big fat urban legend. More like 20 minutes so that's good news. I've trawled through bad commits of data files in our repo and added such paths to a filter file that I'm using for svndumpfilter to get a reasonably-looking dump. In most cases, the files in question existed in a single path(branch( and were no problem. But in some cases, the same files had been copied to a 2nd branch and then svndumpfilter gave me errors about missing source paths, so I added the same path on the 2nd branch to the filter expressions and tried again. After a few iterations of this process, I have a dump that should do what I want. So I start "svnadmin load" and based on initial progress, that might take a couple of days to complete so I leave it overnight. I get back today and the load has crashed with a missing path. The error was: svnadmin: E160013: File not found: transaction '16289-ckh', path 'branches/second/dir/datafile' And looking up the history for that file, I see that "datafile" was added on branch "first" but the path "branches/first/dir" is already in my filter list. So why didn't svndumpfilter throw me an error on this like it did for a lot of other cases? Since the load process it so much slower, the turnaround time for each error in that step is beyond painful, so if there's anything that I can do to assure that this gets caught by the filter would make my life a lot easier. The syntax I used: svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > filterdump svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 --bypass-prop-validation ./NEWREPO < filterdump (I had to use the bypass-prop-validation due to some newline issues in old log message, similar to this one https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't know why they have wrong newlines, but the repo works as it is now...) An additional question about what Johan wrote below: >- You can perfectly well use a 1.10 version of svnadmin or svnsync (or >svnrdump, to create >a dumpfile from a remote server) to interact with a 1.8 server / repository. Can I even do this with "svnadmin load"; I thought that would use an FSFS version 8 while 1.8 should have 6? I got that impression from my "research", but I'm probably off base. TIA, Chris On Thu, 10/4/18, Johan Corveleyn wrote: Subject: Re: svndumpfilter and svnsync? To: "Chris" Cc: "Ryan Schmidt" , "Daniel Shahaf" , "Subversion" Date: Thursday, October 4, 2018, 4:26 PM On Thu, Oct 4, 2018 at 3:03 PM Chris wrote: > > (apologies for the top-posting, I really need to stop using this yahoo web interface which is useless with quoting) > > Thanks for all the replies. I'll try out what you outlined. There are unfortunately problems outside of my control that makes it worse and that is that for company-internal policy reasons, I'm not allowed direct access to the server, I'm only able to get a copy of the repo to work with and a promise that they can replace the repo with my modified version when I'm done. This might make some of the suggestions hard to work with, but I'll see if seems possible. Also, the server runs 1.8, and I have no authority to get it upgraded. I think I may have a chance to change the read permissions for the sync user though, so there's a ray of light somewhere in there :) > > W.r.t. Johan's question about the time consumption for dumping, I haven't been yet able to test it myself, I only got this as second-hand info from someone who did a dump of the repo last year, so I hope that is completely incorrect. Will try dumping as soon as I get my hands on a repo copy. > > Regarding why the repo is so large: my estimate from running some analysis on old revisions is that 90-95% of the data consists of beginners doing accidental commits of things that should not have been allowed to commit > Okay, good luck with those "operations". I wanted to add a couple more bits of info: - After dump+filter+load or svnsync-with-filtering (effectively creating a new repository with an alternate history compared to the original) your new repository will / should have a new UUID. This effectively invalidates all existing working copies out there (which keep track of the UUID they were a checkout from). So all users will have to checkout new working copies. - You can perfectly well use a 1.10 version of svnadmin or svnsync (or svnrdump, to create a dumpfile from a remote server) to interact with a 1.8 server / repository. So if using a more modern version of svnadmin or svnsync is beneficial, you should use it :). - A dump file
Re: svndumpfilter and svnsync?
On Oct 10, 2018, at 02:04, Chris wrote: > I've trawled through bad commits of data files in our repo and added such > paths to a filter file that I'm using for svndumpfilter to get a > reasonably-looking dump. In most cases, the files in question existed in a > single path(branch( and were no problem. But in some cases, the same files > had been copied to a 2nd branch and then svndumpfilter gave me errors about > missing source paths, so I added the same path on the 2nd branch to the > filter expressions and tried again. After a few iterations of this process, I > have a dump that should do what I want. > So I start "svnadmin load" and based on initial progress, that might take a > couple of days to complete so I leave it overnight. I get back today and the > load has crashed with a missing path. The error was: > > svnadmin: E160013: File not found: transaction '16289-ckh', path > 'branches/second/dir/datafile' > > And looking up the history for that file, I see that "datafile" was added on > branch "first" but the path "branches/first/dir" is already in my filter > list. So why didn't svndumpfilter throw me an error on this like it did for a > lot of other cases? > Since the load process it so much slower, the turnaround time for each error > in that step is beyond painful, so if there's anything that I can do to > assure that this gets caught by the filter would make my life a lot easier. I don't know the answer to that, but: > The syntax I used: > svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > > filterdump > svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 > --bypass-prop-validation ./NEWREPO < filterdump > > (I had to use the bypass-prop-validation due to some newline issues in old > log message, similar to this one > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't > know why they have wrong newlines, but the repo works as it is now...) Instead of ignoring wrong newlines, you could fix them using svndumptool (using its eolfix-revprop command), originally at: http://svn.borg.ch/svndumptool/ Newer fork at: https://github.com/jwiegley/svndumptool > An additional question about what Johan wrote below: >> - You can perfectly well use a 1.10 version of svnadmin or svnsync (or >> svnrdump, to create >> a dumpfile from a remote server) to interact with a 1.8 server / repository. > > Can I even do this with "svnadmin load"; I thought that would use an FSFS > version 8 while 1.8 should have 6? I got that impression from my "research", > but I'm probably off base. If you use a newer version of svnadmin (than the one that will be used to serve the repo) to create the new repo and load the dump file, then make sure you pass the right --compatible-version argument to svnadmin create.
Re: svndumpfilter and svnsync?
On Wed, Oct 10, 2018 at 9:16 AM Ryan Schmidt wrote: > > > > On Oct 10, 2018, at 02:04, Chris wrote: > > > I've trawled through bad commits of data files in our repo and added such > > paths to a filter file that I'm using for svndumpfilter to get a > > reasonably-looking dump. In most cases, the files in question existed in a > > single path(branch( and were no problem. But in some cases, the same files > > had been copied to a 2nd branch and then svndumpfilter gave me errors about > > missing source paths, so I added the same path on the 2nd branch to the > > filter expressions and tried again. After a few iterations of this process, > > I have a dump that should do what I want. > > So I start "svnadmin load" and based on initial progress, that might take a > > couple of days to complete so I leave it overnight. I get back today and > > the load has crashed with a missing path. The error was: > > > > svnadmin: E160013: File not found: transaction '16289-ckh', path > > 'branches/second/dir/datafile' > > > > And looking up the history for that file, I see that "datafile" was added > > on branch "first" but the path "branches/first/dir" is already in my filter > > list. So why didn't svndumpfilter throw me an error on this like it did for > > a lot of other cases? > > Since the load process it so much slower, the turnaround time for each > > error in that step is beyond painful, so if there's anything that I can do > > to assure that this gets caught by the filter would make my life a lot > > easier. > > I don't know the answer to that, but: Hm, not really a clear answer here either. I don't know why svndumpfilter did not detect these. However, you might also give 'svnadmin dump --exclude' a try, if you can use version 1.10 of svnadmin. http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude This feature works similarly to 'svnsync with an authz file that denies the excluded files'. That means that, when the source of a copy is excluded, the copy is transformed into an add (so to complete eliminate a bad file and all its copies this might be more difficult to get a hold of these copies ... you won't get any warnings or errors I think -- not sure if it emits a notification for such a copy-to-add conversion). OTOH, 'svnadmin dump --exclude' supports wildcards if you add the --pattern option, so it might be easier to filter out all appearances of a specific filename, as in 'svnadmin dump --pattern --exclude /*/datafile'. > > > The syntax I used: > > svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > > > filterdump > > svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 > > --bypass-prop-validation ./NEWREPO < filterdump > > > > (I had to use the bypass-prop-validation due to some newline issues in old > > log message, similar to this one > > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't > > know why they have wrong newlines, but the repo works as it is now...) > > Instead of ignoring wrong newlines, you could fix them using svndumptool > (using its eolfix-revprop command), originally at: > > http://svn.borg.ch/svndumptool/ > > Newer fork at: > > https://github.com/jwiegley/svndumptool Also, as of version 1.10, svnadmin finally has an option to normalize these on-the-fly during 'load': http://subversion.apache.org/docs/release-notes/1.10.html#normalize-props It's a lot better to normalize these (either with the --normalize-props option for 'svnadmin load' or by using svndumptool) than to "bypass" them. Otherwise you'll run into this again later (if you would dump+load again sometime in the future). And another tip: put the repo-to-be-loaded-into (NEWREPO) on as fast a storage system as possible (SSD, ramdisk if feasible, ...). If you're satisfied with the result, run 'svnadmin pack' on that fast storage, and only then copy it over to the final location. Depending on the final storage that technique might save you a lot of time (especially if you have to redo it a couple of times). > > An additional question about what Johan wrote below: > >> - You can perfectly well use a 1.10 version of svnadmin or svnsync (or > >> svnrdump, to create > >> a dumpfile from a remote server) to interact with a 1.8 server / > >> repository. > > > > Can I even do this with "svnadmin load"; I thought that would use an FSFS > > version 8 while 1.8 should have 6? I got that impression from my > > "research", but I'm probably off base. > > If you use a newer version of svnadmin (than the one that will be used to > serve the repo) to create the new repo and load the dump file, then make sure > you pass the right --compatible-version argument to svnadmin create. Indeed. It's at 'svnadmin create' time that the FSFS version is decided. 'svnadmin load' will just "commit" new revisions in the repository that you first created, and it will follow / respect the FSFS format that's already set. So it's perfectly doable to create an
Re: svndumpfilter and svnsync?
Big thanks for the help, it is greatly appreciated! Some comments and further questions inline below. >> >> On Oct 10, 2018, at 02:04, Chris wrote: >> >>> I've trawled through bad commits of data files in our repo and added > such paths to a filter file that I'm using for svndumpfilter to get a > reasonably-looking dump. In most cases, the files in question existed in > a single path(branch( and were no problem. But in some cases, the same > files had been copied to a 2nd branch and then svndumpfilter gave me > errors about missing source paths, so I added the same path on the 2nd > branch to the filter expressions and tried again. After a few iterations > of this process, I have a dump that should do what I want. >>> So I start "svnadmin load" and based on initial progress, that might > take a couple of days to complete so I leave it overnight. I get back > today and the load has crashed with a missing path. The error was: >>> >>> svnadmin: E160013: File not found: transaction '16289-ckh', path >>> 'branches/second/dir/datafile' >>> >>> And looking up the history for that file, I see that "datafile" was > added on branch "first" but the path "branches/first/dir" is already in > my filter list. So why didn't svndumpfilter throw me an error on this > like it did for a lot of other cases? >>> Since the load process it so much slower, the turnaround time for > each error in that step is beyond painful, so if there's anything that I > can do to assure that this gets caught by the filter would make my life > a lot easier. >> >> I don't know the answer to that, but: > > Hm, not really a clear answer here either. I don't know why > svndumpfilter did not detect these. > > However, you might also give 'svnadmin dump --exclude' a try, if you can > use version 1.10 of svnadmin. > http://subversion.apache.org/docs/release-notes/1.10.html#dump-include- > exclude > > This feature works similarly to 'svnsync with an authz file that > denies the excluded files'. That means that, when the source of a copy > is excluded, the copy is transformed into an add (so to complete > eliminate a bad file and all its copies this might be more difficult > to get a hold of these copies ... you won't get any warnings or errors > I think -- not sure if it emits a notification for such a copy-to-add > conversion). OTOH, 'svnadmin dump --exclude' supports wildcards if you > add the --pattern option, so it might be easier to filter out all > appearances of a specific filename, as in 'svnadmin dump --pattern > --exclude /*/datafile'. I'll try that. Will be a monster of a commandline since dump+exclude doesn't have the "-target " from svndumpfilter and I have 150-ish exclude-statements, but should be doable. Not sure how much I can use patterns based on how the bad commits looked, but should compress the commandline somewhat. > > >> >>> The syntax I used: svnadmin dump -q MYREPO | svndumpfilter exclude >>> --targets filterfile filterdump svnadmin load -q --no-flush-to-disk >>> --force-uuid -M 2048 --bypass- prop-validation ./NEWREPO < filterdump >>> >>> (I had to use the bypass-prop-validation due to some newline issues > in old log message, similar to this one > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, > don't know why they have wrong newlines, but the repo works as it is > now...) >> >> Instead of ignoring wrong newlines, you could fix them using >> svndumptool (using its eolfix-revprop command), originally at: >> >> http://svn.borg.ch/svndumptool/ >> >> Newer fork at: >> >> https://github.com/jwiegley/svndumptool > > Also, as of version 1.10, svnadmin finally has an option to normalize > these on-the-fly during 'load': > http://subversion.apache.org/docs/release-notes/1.10.html#normalize- > props > > It's a lot better to normalize these (either with the > --normalize-props option for 'svnadmin load' or by using svndumptool) > than to "bypass" them. Otherwise you'll run into this again later (if > you would dump+load again sometime in the future). I tried --normalize-props and I still got the same error which is why I switched over to bypass. Maybe I've run into some bug with --normalize-props. Unfortunately, I don't think I'll be able to create a script for reproducing the error since it happens far into a monster dump load. So I'll stick with the bypass for now or try the tool that Ryan suggested. > > And another tip: put the repo-to-be-loaded-into (NEWREPO) on as fast a > storage system as possible (SSD, ramdisk if feasible, ...). If you're > satisfied with the result, run 'svnadmin pack' on that fast storage, > and only then copy it over to the final location. Depending on the > final storage that technique might save you a lot of time (especially > if you have to redo it a couple of times). True, I should have thought of that myself. I'll see what I can do here. Corporate IT policies puts some restraints on me, but definitely worth a shot. Just need to manage to install a svn 1.10 on
Re: svndumpfilter and svnsync?
On Wed, Oct 10, 2018 at 11:18 AM Chris wrote: ... > >>> The syntax I used: svnadmin dump -q MYREPO | svndumpfilter exclude > >>> --targets filterfile filterdump svnadmin load -q --no-flush-to-disk > >>> --force-uuid -M 2048 --bypass- prop-validation ./NEWREPO < filterdump > >>> > >>> (I had to use the bypass-prop-validation due to some newline issues > > in old log message, similar to this one > > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, > > don't know why they have wrong newlines, but the repo works as it is > > now...) > >> > >> Instead of ignoring wrong newlines, you could fix them using > >> svndumptool (using its eolfix-revprop command), originally at: > >> > >> http://svn.borg.ch/svndumptool/ > >> > >> Newer fork at: > >> > >> https://github.com/jwiegley/svndumptool > > > > Also, as of version 1.10, svnadmin finally has an option to normalize > > these on-the-fly during 'load': > > http://subversion.apache.org/docs/release-notes/1.10.html#normalize- > > props > > > > It's a lot better to normalize these (either with the > > --normalize-props option for 'svnadmin load' or by using svndumptool) > > than to "bypass" them. Otherwise you'll run into this again later (if > > you would dump+load again sometime in the future). > > I tried --normalize-props and I still got the same error which is why I > switched over to bypass. Maybe I've run into some bug with --normalize-props. > Unfortunately, I don't think I'll be able to create a script for reproducing > the error since it happens far into a monster dump load. > So I'll stick with the bypass for now or try the tool that Ryan suggested. In that case the culprit might be another property than svn:log (or it might be something like "non UTF-8 encoded" but not EOL-related in svn:log). Possibly a "versioned" property like svn:ignore or some other property in the svn: namespace. This is more difficult to fix, but still it might be best to get rid of it or you'll run into it again in the future. See the very last bullet in: http://subversion.apache.org/faq.html#dumpload If that's indeed the problem, then you'll have to use that svndumptool that Ryan pointed you to. Quoting from that last bullet in the FAQ entry above: "This is more difficult to repair, because 'svn:ignore' is not a revision property (unlike svn:log, which can be manipulated with svnadmin setrevprop), but a versioned property (so it's part of history). Again, you can ignore this with --bypass-prop-validation. But since this is a corruption "in history", this can only be repaired with a dump+load, so this might be a good time to try and fix this (or you'll run into this again in the future). To repair it you can use a tool like svndumptool. But it only works on dump files, not as part of a pipe. So a possible way to go about it is: dump that single (corrupt) revision to a file, repair it ('svndumptool.py eolfix-prop svn:ignore svn.dump svn.dump.repaired'), load that single dumpfile, and then continue with a new "piped" command (like step (6) above). " I should note here that svnsync is more powerful in this regard: it does have the ability to normalize all of these on the fly. It's a real pity that 'svnadmin load' doesn't (except for the svn:log EOL fixing). Doesn't *yet* that is, until a volunteer comes along that submits a patch for it ;-). Anyway, I hope you succeed in cleaning this up eventually :-). -- Johan
Fwd: Cleanup needs file present
Hi! Please see the forwarded message below. TSVN folks said I should report it here. The TSVN program mentioned uses subversion 1.10.2 -- Forwarded message - From: David Balažic via TortoiseSVN < tortoisesvn+apn2wqcbgqsoa3mwkzzi2nerxnmzj7xvoios7psfbx_2r0u8y...@googlegroups.com > Date: Tue, 9 Oct 2018 at 13:22 Subject: Cleanup needs file present To: TortoiseSVN with TSVN 1.10.1 x64 I tried to commit a changed version of two files in the same folder, but the folder was not updated. Long story short: the WC got currupted and needed a cleanup. As there was a conflict during update which created those "merge" files, I decided to delete all the affected files in the folder (the new yet uncommitted version, the xxx.r1234 files and similar). Then I ran Cleanup, several times with all options enabled, but it always failed, complaining about a missing file (one of the original files I tried to change and commit). I tried revert, but it just complained that I need to run cleanup first. Then I checked out that folder (with that exact revision) into a new WC, copied the filed to the old WC, run cleanup, and then it finally worked. What I consider a bug here is the fact that cleanup insists on the presence of files which it could simply copy from the .svn folder. Regards, David