Re: svndumpfilter and svnsync?

2018-10-10 Thread Chris
Hi again,

I managed to get some better permissions so I don't have to do svnsync and can 
get by with doing incremental dumps/loads, but I'm a bit confused by the 
svndumpfilter + load process so any help would be appreciated.

First of all, my statement about the dump taking 2 weeks was a big fat urban 
legend. More like 20 minutes so that's good news.

I've trawled through bad commits of data files in our repo and added such paths 
to a filter file that I'm using for svndumpfilter to get a reasonably-looking 
dump. In most cases, the files in question existed in a single path(branch( and 
were no problem. But in some cases, the same files had been copied to a 2nd 
branch and then svndumpfilter gave me errors about missing source paths, so I 
added the same path on the 2nd branch to the filter expressions and tried 
again. After a few iterations of this process, I have a dump that should do 
what I want.
So I start "svnadmin load" and based on initial progress, that might take a 
couple of days to complete so I leave it overnight. I get back today and the 
load has crashed with a missing path. The error was:

svnadmin: E160013: File not found: transaction '16289-ckh', path 
'branches/second/dir/datafile'

And looking up the history for that file, I see that "datafile" was added on 
branch "first" but the path "branches/first/dir" is already in my filter list. 
So why didn't svndumpfilter throw me an error on this like it did for a lot of 
other cases?
Since the load process it so much slower, the turnaround time for each error in 
that step is beyond painful, so if there's anything that I can do to assure 
that this gets caught by the filter would make my life a lot easier.

The syntax I used:
svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > 
filterdump
svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 
--bypass-prop-validation ./NEWREPO < filterdump

(I had to use the bypass-prop-validation due to some newline issues in old log 
message, similar to this one 
https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't 
know why they have wrong newlines, but the repo works as it is now...)


An additional question about what Johan wrote below:
>- You can perfectly well use a 1.10 version of svnadmin or svnsync (or 
>svnrdump, to create
>a dumpfile from a remote server) to interact with a 1.8 server / repository.

Can I even do this with "svnadmin load"; I thought that would use an FSFS 
version 8 while 1.8 should have 6? I got that impression from my "research", 
but I'm probably off base.

TIA,
  Chris



On Thu, 10/4/18, Johan Corveleyn  wrote:

 Subject: Re: svndumpfilter and svnsync?
 To: "Chris" 
 Cc: "Ryan Schmidt" , "Daniel Shahaf" 
, "Subversion" 
 Date: Thursday, October 4, 2018, 4:26 PM
 
 On Thu, Oct 4, 2018 at 3:03 PM
 Chris 
 wrote:
 >
 > (apologies
 for the top-posting, I really need to stop using this yahoo
 web interface which is useless with quoting)
 >
 > Thanks for all the
 replies. I'll try out what you outlined. There are
 unfortunately problems outside of my control that makes it
 worse and that is that for company-internal policy reasons,
 I'm not allowed direct access to the server, I'm
 only able to get a copy of the repo to work with and a
 promise that they can replace the repo with my modified
 version when I'm done. This might make some of the
 suggestions hard to work with, but I'll see if seems
 possible. Also, the server runs 1.8, and I have no authority
 to get it upgraded. I think I may have a chance to change
 the read permissions for the sync user though, so
 there's a ray of light somewhere in there :)
 >
 > W.r.t. Johan's
 question about the time consumption for dumping, I
 haven't been yet able to test it myself, I only got this
 as second-hand info from someone who did a dump of the repo
 last year, so I hope that is completely incorrect. Will try
 dumping as soon as I get my hands on a repo copy.
 >
 > Regarding why the
 repo is so large: my estimate from running some analysis on
 old revisions is that 90-95% of the data consists of
 beginners doing accidental commits of things that should not
 have been allowed to commit
 >
 
 Okay, good luck with those
 "operations". I wanted to add a couple more
 bits of info:
 
 - After dump+filter+load or
 svnsync-with-filtering (effectively
 creating
 a new repository with an alternate history compared to
 the
 original) your new repository will /
 should have a new UUID. This
 effectively
 invalidates all existing working copies out there (which
 keep track of the UUID they were a checkout
 from). So all users will
 have to checkout
 new working copies.
 
 - You
 can perfectly well use a 1.10 version of svnadmin or svnsync
 (or
 svnrdump, to create a dumpfile from a
 remote server) to interact with
 a 1.8 server
 / repository. So if using a more modern version of
 svnadmin or svnsync is beneficial, you should
 use it :).
 
 - A dump file

Re: svndumpfilter and svnsync?

2018-10-10 Thread Ryan Schmidt



On Oct 10, 2018, at 02:04, Chris wrote:

> I've trawled through bad commits of data files in our repo and added such 
> paths to a filter file that I'm using for svndumpfilter to get a 
> reasonably-looking dump. In most cases, the files in question existed in a 
> single path(branch( and were no problem. But in some cases, the same files 
> had been copied to a 2nd branch and then svndumpfilter gave me errors about 
> missing source paths, so I added the same path on the 2nd branch to the 
> filter expressions and tried again. After a few iterations of this process, I 
> have a dump that should do what I want.
> So I start "svnadmin load" and based on initial progress, that might take a 
> couple of days to complete so I leave it overnight. I get back today and the 
> load has crashed with a missing path. The error was:
> 
> svnadmin: E160013: File not found: transaction '16289-ckh', path 
> 'branches/second/dir/datafile'
> 
> And looking up the history for that file, I see that "datafile" was added on 
> branch "first" but the path "branches/first/dir" is already in my filter 
> list. So why didn't svndumpfilter throw me an error on this like it did for a 
> lot of other cases?
> Since the load process it so much slower, the turnaround time for each error 
> in that step is beyond painful, so if there's anything that I can do to 
> assure that this gets caught by the filter would make my life a lot easier.

I don't know the answer to that, but:


> The syntax I used:
> svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > 
> filterdump
> svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 
> --bypass-prop-validation ./NEWREPO < filterdump
> 
> (I had to use the bypass-prop-validation due to some newline issues in old 
> log message, similar to this one 
> https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't 
> know why they have wrong newlines, but the repo works as it is now...)

Instead of ignoring wrong newlines, you could fix them using svndumptool (using 
its eolfix-revprop command), originally at:

http://svn.borg.ch/svndumptool/

Newer fork at:

https://github.com/jwiegley/svndumptool


> An additional question about what Johan wrote below:
>> - You can perfectly well use a 1.10 version of svnadmin or svnsync (or 
>> svnrdump, to create
>> a dumpfile from a remote server) to interact with a 1.8 server / repository.
> 
> Can I even do this with "svnadmin load"; I thought that would use an FSFS 
> version 8 while 1.8 should have 6? I got that impression from my "research", 
> but I'm probably off base.

If you use a newer version of svnadmin (than the one that will be used to serve 
the repo) to create the new repo and load the dump file, then make sure you 
pass the right --compatible-version argument to svnadmin create.




Re: svndumpfilter and svnsync?

2018-10-10 Thread Johan Corveleyn
On Wed, Oct 10, 2018 at 9:16 AM Ryan Schmidt
 wrote:
>
>
>
> On Oct 10, 2018, at 02:04, Chris wrote:
>
> > I've trawled through bad commits of data files in our repo and added such 
> > paths to a filter file that I'm using for svndumpfilter to get a 
> > reasonably-looking dump. In most cases, the files in question existed in a 
> > single path(branch( and were no problem. But in some cases, the same files 
> > had been copied to a 2nd branch and then svndumpfilter gave me errors about 
> > missing source paths, so I added the same path on the 2nd branch to the 
> > filter expressions and tried again. After a few iterations of this process, 
> > I have a dump that should do what I want.
> > So I start "svnadmin load" and based on initial progress, that might take a 
> > couple of days to complete so I leave it overnight. I get back today and 
> > the load has crashed with a missing path. The error was:
> >
> > svnadmin: E160013: File not found: transaction '16289-ckh', path 
> > 'branches/second/dir/datafile'
> >
> > And looking up the history for that file, I see that "datafile" was added 
> > on branch "first" but the path "branches/first/dir" is already in my filter 
> > list. So why didn't svndumpfilter throw me an error on this like it did for 
> > a lot of other cases?
> > Since the load process it so much slower, the turnaround time for each 
> > error in that step is beyond painful, so if there's anything that I can do 
> > to assure that this gets caught by the filter would make my life a lot 
> > easier.
>
> I don't know the answer to that, but:

Hm, not really a clear answer here either. I don't know why
svndumpfilter did not detect these.

However, you might also give 'svnadmin dump --exclude' a try, if you
can use version 1.10 of svnadmin.
http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-exclude

This feature works similarly to 'svnsync with an authz file that
denies the excluded files'. That means that, when the source of a copy
is excluded, the copy is transformed into an add (so to complete
eliminate a bad file and all its copies this might be more difficult
to get a hold of these copies ... you won't get any warnings or errors
I think -- not sure if it emits a notification for such a copy-to-add
conversion). OTOH, 'svnadmin dump --exclude' supports wildcards if you
add the --pattern option, so it might be easier to filter out all
appearances of a specific filename, as in 'svnadmin dump --pattern
--exclude /*/datafile'.


>
> > The syntax I used:
> > svnadmin dump -q MYREPO | svndumpfilter exclude --targets filterfile > 
> > filterdump
> > svnadmin load -q --no-flush-to-disk --force-uuid -M 2048 
> > --bypass-prop-validation ./NEWREPO < filterdump
> >
> > (I had to use the bypass-prop-validation due to some newline issues in old 
> > log message, similar to this one 
> > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA, don't 
> > know why they have wrong newlines, but the repo works as it is now...)
>
> Instead of ignoring wrong newlines, you could fix them using svndumptool 
> (using its eolfix-revprop command), originally at:
>
> http://svn.borg.ch/svndumptool/
>
> Newer fork at:
>
> https://github.com/jwiegley/svndumptool

Also, as of version 1.10, svnadmin finally has an option to normalize
these on-the-fly during 'load':
http://subversion.apache.org/docs/release-notes/1.10.html#normalize-props

It's a lot better to normalize these (either with the
--normalize-props option for 'svnadmin load' or by using svndumptool)
than to "bypass" them. Otherwise you'll run into this again later (if
you would dump+load again sometime in the future).

And another tip: put the repo-to-be-loaded-into (NEWREPO) on as fast a
storage system as possible (SSD, ramdisk if feasible, ...). If you're
satisfied with the result, run 'svnadmin pack' on that fast storage,
and only then copy it over to the final location. Depending on the
final storage that technique might save you a lot of time (especially
if you have to redo it a couple of times).

> > An additional question about what Johan wrote below:
> >> - You can perfectly well use a 1.10 version of svnadmin or svnsync (or 
> >> svnrdump, to create
> >> a dumpfile from a remote server) to interact with a 1.8 server / 
> >> repository.
> >
> > Can I even do this with "svnadmin load"; I thought that would use an FSFS 
> > version 8 while 1.8 should have 6? I got that impression from my 
> > "research", but I'm probably off base.
>
> If you use a newer version of svnadmin (than the one that will be used to 
> serve the repo) to create the new repo and load the dump file, then make sure 
> you pass the right --compatible-version argument to svnadmin create.

Indeed. It's at 'svnadmin create' time that the FSFS version is
decided. 'svnadmin load' will just "commit" new revisions in the
repository that you first created, and it will follow / respect the
FSFS format that's already set. So it's perfectly doable to create an

Re: svndumpfilter and svnsync?

2018-10-10 Thread Chris
Big thanks for the help, it is greatly appreciated!
Some comments and further questions inline below.

>> 
>> On Oct 10, 2018, at 02:04, Chris wrote:
>> 
>>> I've trawled through bad commits of data files in our repo and added
> such paths to a filter file that I'm using for svndumpfilter to get a
> reasonably-looking dump. In most cases, the files in question existed in
> a single path(branch( and were no problem. But in some cases, the same
> files had been copied to a 2nd branch and then svndumpfilter gave me
> errors about missing source paths, so I added the same path on the 2nd
> branch to the filter expressions and tried again. After a few iterations
> of this process, I have a dump that should do what I want.
>>> So I start "svnadmin load" and based on initial progress, that might
> take a couple of days to complete so I leave it overnight. I get back
> today and the load has crashed with a missing path. The error was:
>>> 
>>> svnadmin: E160013: File not found: transaction '16289-ckh', path
>>> 'branches/second/dir/datafile'
>>> 
>>> And looking up the history for that file, I see that "datafile" was
> added on branch "first" but the path "branches/first/dir" is already in
> my filter list. So why didn't svndumpfilter throw me an error on this
> like it did for a lot of other cases?
>>> Since the load process it so much slower, the turnaround time for
> each error in that step is beyond painful, so if there's anything that I
> can do to assure that this gets caught by the filter would make my life
> a lot easier.
>> 
>> I don't know the answer to that, but:
> 
> Hm, not really a clear answer here either. I don't know why
> svndumpfilter did not detect these.
> 
> However, you might also give 'svnadmin dump --exclude' a try, if you can
> use version 1.10 of svnadmin.
> http://subversion.apache.org/docs/release-notes/1.10.html#dump-include-
> exclude
> 
> This feature works similarly to 'svnsync with an authz file that
> denies the excluded files'. That means that, when the source of a copy
> is excluded, the copy is transformed into an add (so to complete
> eliminate a bad file and all its copies this might be more difficult
> to get a hold of these copies ... you won't get any warnings or errors
> I think -- not sure if it emits a notification for such a copy-to-add
> conversion). OTOH, 'svnadmin dump --exclude' supports wildcards if you
> add the --pattern option, so it might be easier to filter out all
> appearances of a specific filename, as in 'svnadmin dump --pattern
> --exclude /*/datafile'.

I'll try that. Will be a monster of a commandline since dump+exclude
doesn't have the "-target " from svndumpfilter and I have 150-ish
exclude-statements, but should be doable.
Not sure how much I can use patterns based on how the bad commits looked,
but should compress the commandline somewhat.

> 
> 
>> 
>>> The syntax I used: svnadmin dump -q MYREPO | svndumpfilter exclude
>>> --targets filterfile filterdump svnadmin load -q --no-flush-to-disk
>>> --force-uuid -M 2048 --bypass- prop-validation ./NEWREPO < filterdump
>>> 
>>> (I had to use the bypass-prop-validation due to some newline issues
> in old log message, similar to this one
> https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA,
> don't know why they have wrong newlines, but the repo works as it is
> now...)
>> 
>> Instead of ignoring wrong newlines, you could fix them using
>> svndumptool (using its eolfix-revprop command), originally at:
>> 
>> http://svn.borg.ch/svndumptool/
>> 
>> Newer fork at:
>> 
>> https://github.com/jwiegley/svndumptool
> 
> Also, as of version 1.10, svnadmin finally has an option to normalize
> these on-the-fly during 'load':
> http://subversion.apache.org/docs/release-notes/1.10.html#normalize-
> props
> 
> It's a lot better to normalize these (either with the
> --normalize-props option for 'svnadmin load' or by using svndumptool)
> than to "bypass" them. Otherwise you'll run into this again later (if
> you would dump+load again sometime in the future).

I tried --normalize-props and I still got the same error which is why I
switched over to bypass. Maybe I've run into some bug with --normalize-props.
Unfortunately, I don't think I'll be able to create a script for reproducing
the error since it happens far into a monster dump load.
So I'll stick with the bypass for now or try the tool that Ryan suggested.

> 
> And another tip: put the repo-to-be-loaded-into (NEWREPO) on as fast a
> storage system as possible (SSD, ramdisk if feasible, ...). If you're
> satisfied with the result, run 'svnadmin pack' on that fast storage,
> and only then copy it over to the final location. Depending on the
> final storage that technique might save you a lot of time (especially
> if you have to redo it a couple of times).

True, I should have thought of that myself.
I'll see what I can do here. Corporate IT policies puts some restraints
on me, but definitely worth a shot. Just need to manage to
install a svn 1.10 on

Re: svndumpfilter and svnsync?

2018-10-10 Thread Johan Corveleyn
On Wed, Oct 10, 2018 at 11:18 AM Chris  wrote:
...
> >>> The syntax I used: svnadmin dump -q MYREPO | svndumpfilter exclude
> >>> --targets filterfile filterdump svnadmin load -q --no-flush-to-disk
> >>> --force-uuid -M 2048 --bypass- prop-validation ./NEWREPO < filterdump
> >>>
> >>> (I had to use the bypass-prop-validation due to some newline issues
> > in old log message, similar to this one
> > https://groups.google.com/forum/#!topic/subversion_users/P3ohZ-hKhCA,
> > don't know why they have wrong newlines, but the repo works as it is
> > now...)
> >>
> >> Instead of ignoring wrong newlines, you could fix them using
> >> svndumptool (using its eolfix-revprop command), originally at:
> >>
> >> http://svn.borg.ch/svndumptool/
> >>
> >> Newer fork at:
> >>
> >> https://github.com/jwiegley/svndumptool
> >
> > Also, as of version 1.10, svnadmin finally has an option to normalize
> > these on-the-fly during 'load':
> > http://subversion.apache.org/docs/release-notes/1.10.html#normalize-
> > props
> >
> > It's a lot better to normalize these (either with the
> > --normalize-props option for 'svnadmin load' or by using svndumptool)
> > than to "bypass" them. Otherwise you'll run into this again later (if
> > you would dump+load again sometime in the future).
>
> I tried --normalize-props and I still got the same error which is why I
> switched over to bypass. Maybe I've run into some bug with --normalize-props.
> Unfortunately, I don't think I'll be able to create a script for reproducing
> the error since it happens far into a monster dump load.
> So I'll stick with the bypass for now or try the tool that Ryan suggested.

In that case the culprit might be another property than svn:log (or it
might be something like "non UTF-8 encoded" but not EOL-related in
svn:log). Possibly a "versioned" property like svn:ignore or some
other property in the svn: namespace. This is more difficult to fix,
but still it might be best to get rid of it or you'll run into it
again in the future.

See the very last bullet in:
http://subversion.apache.org/faq.html#dumpload

If that's indeed the problem, then you'll have to use that svndumptool
that Ryan pointed you to.
Quoting from that last bullet in the FAQ entry above:

"This is more difficult to repair, because 'svn:ignore' is not a
revision property (unlike svn:log, which can be manipulated with
svnadmin setrevprop), but a versioned property (so it's part of
history). Again, you can ignore this with --bypass-prop-validation.
But since this is a corruption "in history", this can only be repaired
with a dump+load, so this might be a good time to try and fix this (or
you'll run into this again in the future). To repair it you can use a
tool like svndumptool. But it only works on dump files, not as part of
a pipe. So a possible way to go about it is: dump that single
(corrupt) revision to a file, repair it ('svndumptool.py eolfix-prop
svn:ignore svn.dump svn.dump.repaired'), load that single dumpfile,
and then continue with a new "piped" command (like step (6) above). "

I should note here that svnsync is more powerful in this regard: it
does have the ability to normalize all of these on the fly. It's a
real pity that 'svnadmin load' doesn't (except for the svn:log EOL
fixing). Doesn't *yet* that is, until a volunteer comes along that
submits a patch for it ;-).

Anyway, I hope you succeed in cleaning this up eventually :-).
-- 
Johan


Fwd: Cleanup needs file present

2018-10-10 Thread David Balažic
Hi!

Please see the forwarded message below. TSVN folks said I should report it
here.
The TSVN program mentioned uses subversion 1.10.2

-- Forwarded message -
From: David Balažic via TortoiseSVN <
tortoisesvn+apn2wqcbgqsoa3mwkzzi2nerxnmzj7xvoios7psfbx_2r0u8y...@googlegroups.com
>
Date: Tue, 9 Oct 2018 at 13:22
Subject: Cleanup needs file present
To: TortoiseSVN 


with TSVN 1.10.1 x64 I tried to commit a changed version of two files in
the same folder, but the folder was not updated.
Long story short: the WC got currupted and needed a cleanup.

As there was a conflict during update which created those "merge" files, I
decided to delete all the affected files in the folder (the new yet
uncommitted version, the xxx.r1234 files and similar).

Then I ran Cleanup, several times with all options enabled, but it always
failed, complaining about a missing file (one of the original files I tried
to change and commit).
I tried revert, but it just complained that I need to run cleanup first.

Then I checked out that folder (with that exact revision) into a new WC,
copied the filed to the old WC, run cleanup, and then it finally worked.


What I consider a bug here is the fact that cleanup insists on the presence
of files which it could simply copy from the .svn folder.

Regards,
David