Mark Phippard wrote on Fri, Sep 12, 2014 at 11:24:43 -0400: > On Fri, Sep 12, 2014 at 11:17 AM, Thomas Harold <thomas-li...@nybeta.com> > wrote: > > > I have a question about how efficient SVN is at de-duplication within a > > repository with regards to files that appear in multiple locations, but > > which have the same content. > > > > I know a small improvement was made in 1.8... > > > > http://subversion.apache.org/docs/release-notes/1.8.html#fsfs-enhancements > > > > > When representation sharing has been enabled, Subversion 1.8 will now > > > be able to detect files and properties with identical contents within > > > the same revision and only store them once. This is a common > > > situation when you for instance import a non-incremental dump file or > > > when users apply the same change to multiple branches in a single > > > commit. > > > > #1 - If a commit puts files A, B and C into the repository, and a latter > > commit puts files B, C and D into the repository at a different > > location, is SVN smart enough to realize that B and C are already stored > > in the repository? > > > > In other words, does it track each individual file separately, even if > > they were all part of one big revision? > > > > Representation cache is based on the sha of the rep. So it does not matter > what the filename is or where it is stored. If it has the same sha as an > existing rep, then it will be be shared. > > The small improvement in 1.8 was simply to do this for files being added > within the same revision, but the other scenario was already supported. > > I think it is worth pointing out that a rep is not necessarily a "file". > It is the specific delta that SVN would be storing in the repository DB.
The sha1 of the rep itself doesn't matter. The rep-cache.db file is a cache of (sha1 of fulltext ↦ location of rep generating that fulltext). As to the idea of doing the sha1 at chunk level rather than at file level: I suggest to discuss that on dev@. Some backend devs might otherwise miss the discussion. Cheers, Daniel