On Jan 24, 2012, at 15:18, The Grey Wolf wrote: > Hello, I'm not quite sure how to properly phrase the subject > as a query term, so if this has been answered, please forgive > the redundancy and quietly point me to where this gets addressed. > > We are using svn at work to hold customer 'vault' data [various bits > of information for each customer]. It has been a huge success -- to > the point where we have over 1,000 customers using vaults. The checkins > are automated, and we have amassed over 100,000 revisions thus far. > > User directories are created as /Ab/username [where Ab is a 2-character > hash via a known (balanced) algorithm to make location of username files more > machine-efficient]. So we have about 1,200 of these guys, with some hashes > obviously being re-used, no big deal. > > The problem is that, even on miniscule changes, we are finding the > db/rev/<shard>/<revno> files to be disproportionately large; for an > addition or change of a file that is about 1k-4k, the rev files are > at 100K each. At lower revisions, we noticed that the rev files are > 4k but have been increasing in size with each shard that gets added, > usually to the tune of 1k/shard. With so many revisions being checked > in at a rapid rate, we found ourselves having to take production off > line for a couple of minutes while we migrated the repository in question > to a larger filesystem due to the threat of the filesystem filling > up. > > The upshot of this is: Why does a minimal delta create such a large > delta file? 100k for a small change? What's going on and how can we > mitigate this?
It probably has to do with the size of the directory entries, not the changes you're making to the files. If you add a file, that's recorded as a change to the directory. When you change a file, Subversion stores only the changes you made, not the complete new file, and it stores them compressed. However, when you change a directory (e.g. by adding or removing a file or directory), Subversion records a complete new copy of the directory, and I don't know if it's compressed or not. If the directory has hundreds or thousands of items, that will take some space. I don't remember if modifying a file counts as a change to the directory, but adding or deleting a file certainly do. Based on this I would assume you could mitigate the problem by having fewer items in each directory. Create a deeper directory structure from your hash: /A/Ab/username, or even /A/Ab/Abc/username. You should try this out in a testing environment. Either create some test data, or dump your current repository, and then a) load it into a fresh empty repository as-is, and b) transform it into a deeper directory structure using a tool like svndumptool, then load that into a second fresh empty repository. Then see if there is an appreciable size difference.