> I have a svn 1.9 repository, created with svnsync, that has ~150000 > > revisions and size about 45 GB. > > 300kB/rev is quite large, like >1 MB of changes before > compression - on average. Are these office documents, > large xml / html files or simply many files per commit? > > The content is mixed. Quite many small, source code commits. But office documents and zip archives as well. There are even few extremely huge commits, biggest one is 3+GB, one 800+MB and one 500+MB (as per revision file size in db/revs folder)
> > Due to some issues in svn-all-fast-export I > > wanted to have svn 1.8 version repository so I downgraded it by doing > > svnadmin (v 1.9) dump /svnadmin (v 1.8) load cycle. I was surprised that > > the size of v 1.8 repository is "only" 37.5 GB > > I tried to compare content of db\revs folder: some files are bigger in > 1.8 > > repo, some in 1.9 repo. > > For the record: you already said elsewhere in this > thread that you used 1.8 to create the 1.8 repo and > 1.9 for the 1.9. I also assume standard settings > as in "no fsfs.conf tweaks". > > Correct. > There is a simple way to compare the "content size" > your repositories. Run the 1.9 svnfsfs tool on both: > > svnfsfs stats -M 1000 /path/to/repo > /some/output/path > > It basically reads the whole repository, groups and > aggregates the item sizes and produces a long report. > Number of changes and node revision should be more > or less (exactly?) the same. If they are, you'll > be good. > > "Representation" size is where the numbers will differ. > Looking at the differences in detail, you should be able > to pin down one or two file extensions that account for > most of the increase. It would be interesting to learn > what is special about them ... > Yes, number of changes and number of node revision records are identical. Number of representation do differ (1.744.149 @1.8 vs 1.901.312 @1.9) The "nodes total", "directory noderevs" and "file noderevs" numbers are identical The "Largest representations:" sections shows that 1.9 has failed to de-duplicate several files (executables in this case) The "Extensions by number of representations:" shows that all extensions have bigger number of representations in 1.9 repo The size if representations is most increased for .exe and .pdf extensions, where .exe causes 5GB increase and .pdf 500MB. Several types cause increase ~300MB, "others" have +1GB The dump/load cycle into 1.9 is finished as well, now it is 36.2 GB (less compared to 1.8 which was 37.5 GB). Both 1.9->1.9 and 1.8->1.9 resulted almost identical repos when comparing files byte by byte (the exception is UUID file)... Which makes me wonder if I dumped the same rep twice. Too bad the windows cmd doesn't retain command history. Gert