Repository migrated via SVNSYNC is much smaller than one migrated using SVNADMIN DUMP/LOAD

2014-08-28 Thread Christopher Lamb

Hi all

we are in the process of migrating a SVN repository from Windows to Linux.

While experimenting how best to do this I have tried using both SVNADMIN
DUMP / LOAD, SVNSYNC, and even SVNRDUMP. The resulting target repos have
dramatically different sizes, hence this mail.


Original repo size 5.51 GB

Almost 19,000 revisions

SVN version 1.5.1

FSFS

Windows Server 2003



Target repo(s)

SVN version 1.7.14 (r1542130)

FSFS

Oracle Enterprise Linux 7.0



In both cases SVN was installed as an executable (i.e. not compiled or

modified by us).



I first attempted a migration using SVNADMIN DUMP / LOAD. This gave me a

dump file of 10.8 GB. However, as we encountered the" E125005: Cannot

accept non-LF line endings in 'svn:log' property" error (more of which at

the bottom of this mail), I also tried migrations using SVNSYNC and

SVNRDUMP, as these tools should auto-correct the non LF line-endings.



Migrating using SVNADMIN DUMP / LOAD, I get a target repo of 4.5 GB



Migrating using SVNSYNC, I get a target repo of 149 MB! I have tried

running SVNSYNC from the target machine (OEL 7.0), and from the source

machine (Win 2003), and get the same result. SVNRDUMP also gives a repo of

149 MB



To eliminate OEL 7.0 / SVN 1.7.14 as factors, I also did a test migration

via SVNSYNC to SVN on an OEL 6.5 running SVN 1.6.11 (r934486). This gave a

repo of 148 MB.



So SVNADMIN gives me a 4.5 GB repo, which is close in size to the original

repo size (5.51 GB), but SVNSYNC & SVNRDUMP give me 149 MB repos!



Drilling down into the file system of the repos, I can see that:

a) the revprops directories of SVNADMIN and SVNSYNC loaded repos are both

75MB

b) the revs directories contain a similar number of files, but have

significant size differences:


4.4G/svn_repos/repo_loadedby_svnadmin/db/revs
75M /svn_repos/repo_loadedby_svnsync/db/revs


SVN LIST gives results for the repo loaded by SVNADMIN, but nothing for

those loaded via SVNSYNC.



Any ideas what is going wrong with SVNSYNC? Have I missed anything? I have

googled myself to near death, but cannot find any similar reports.



Thanks in advance for your help



Chris Lamb

IT Architect



p.s, on the topic of E125005:



The various sed based solutions suggested on

http://stackoverflow.com/questions/10279222/how-can-i-fix-the-svn-import-line-endings-error
 did not work for me. The resulting file produced the same error.


I did not want to use -bypass-prop-validation, as this just delays the

problem until a later date.

I ended up using the SVN 1.5.1 SVNSYNC (which also throws this error) to

identify the affected properties, then corrected them via svn propedit,

editing one property higher than the last revision reported.

svn propedit svn:log  svn://otms/repo_source --username  --password
 --revprop -r 14928

In most cases, just saving the property "unchanged" in vi was enough to
correct it. Having edited the property, I then rerun SVNSYNC until the next
error ...











Re: Repository migrated via SVNSYNC is much smaller than one migrated using SVNADMIN DUMP/LOAD : SOLVED

2014-08-29 Thread Christopher Lamb
Hi All

Thanks to Andreas and Les for your kind hints.

Having read and re-read your replies, and reviewing how I was calling
SVNSYNC, and even a bit of RTFM, I have now solved this issue.

The Eureka moment came when I looked at the output from SVNSYNC, and
compared it to that shown in the SVNSYNC documentation.

It turns out I was incorrectly addressing the source repository. (not
exactly Les's typo, but the result is the same)

Strangely what I was doing was correct enough for properties and revisions
to be replicated, but not correct enough to transmit files .

Lets call my original attempt(s) "Bad", and the corrected approach "Good".

"Bad" Calls:
svnsync init svn://target_server/target_repo svn://source_server
/source_repo
svnsync sync svn://target_server/target_repo svn://source_server
/source_repo

"Bad" Output

Copied properties for revision 15.
Committed revision 16.
Copied properties for revision 16.
Committed revision 17.
Copied properties for revision 17.
Committed revision 18.


This certainly appeared to be doing something, and populated the revs and
revprops directories with the correct number of files. While the revprops
dir looked good, the revs dir was far too small (and of almost equal size
to the revprops dir)

I should be calling SVNSYNC as shown below

"Good" Calls:
svnsync init svn://target_server/target_repo svn://source_server
svnsync sync svn://target_server/target_repo svn://source_server

"Good" Output:

Transmitting file data .
Committed revision 7088.
Copied properties for revision 7088.
Transmitting file data .
Committed revision 7089.
Copied properties for revision 7089.


The difference is shown in the output: Here we additionally have
"Transmitting file data."

In the "bad" case I was addressing the source repo as
svn://source_server/source_repo
In the "good" case I address the source repo as
svn://source_server
In both cases the target repo is addressed as
svn://target_server/target_repo

So why the difference / confusion?

SvnServe on the target server is setup with one port (3690) for all repos,
so the repos need to be addressed as svn://target_server/target_repo

SvnServe on the source server is setup with one port per repo. As the repo
I am testing with uses the default port 3690, all I need to address it is
the sever name: svn://source_server. Presumably for other repos I will need
to add the port e.g. svn://source_server:3691

The annoying thing is that my addressing was incorrect, yet still correct
enough to so something...

Cheers

Chris

p.s
Now that I have removed CR LFs from over 600 revprops, I can use SVNADMIN
DUMP / LOAD for the actual migration, and SVNSYNC as part of the backup
plan.





From:   Andreas Stieger 
To: Christopher Lamb/Switzerland/IBM@IBMCH,
Cc: users@subversion.apache.org
Date:   28.08.2014 19:38
Subject:Re: Repository migrated via SVNSYNC is much smaller than one
migrated using SVNADMIN DUMP/LOAD



Hello,

On 28/08/14 17:12, Christopher Lamb wrote:
> While experimenting how best to do this I have tried using both SVNADMIN
> DUMP / LOAD, SVNSYNC, and even SVNRDUMP. The resulting target repos have
> dramatically different sizes, hence this mail.
>
>
> Original repo size 5.51 GB
> Almost 19,000 revisions
> SVN version 1.5.1
> FSFS
>
> Target repo(s)
> SVN version 1.7.14 (r1542130)
> FSFS
> [10.8GiB with dump load, 149 MiB with svnsync]


1. Representation sharing in 1.6:
https://subversion.apache.org/docs/release-notes/1.6.html#filesystem-improvements


2. When using svnsync, make sure the reading user has fully recursive
read access, and no subtree has restrictive controls.
From
http://svnbook.red-bean.com/en/1.7/svn.reposadmin.maint.html#svn.reposadmin.maint.replication.svnsync-partial


> svnsync isn't limited to full copies of everything which lives in a
repository. It can handle various shades of partial replication, too. For
example, while it isn't very commonplace to do so, svnsync does gracefully
mirror repositories in which the user as whom it authenticates has only
partial read access. It simply copies only the bits of the repository that
it is permitted to see. Obviously, such a mirror is not useful as a backup
solution.

Obviously in your case it may sync partially (without message) and thus
result in a smaller repository. You will get an initial hint by printing
the full HEAD tree of the root of each repository. Any differences there
would point to a problem.


As you get two different size, both are likely to occur, while you only
want #1 for a migration.

Regards,
Andreas