On Fri, Feb 8, 2013 at 10:57 AM, Nico Kadel-Garcia <nka...@gmail.com> wrote: >> >>> In my $work, we manage thousands of binary files (tiffs). We may modify a >>> file once or twice before eventually entering the file as a record. Files >>> arrive in groups (a submission) and I would like to track changes and the >>> history of a file. Once the file is entered as a record, I could remove much >>> of the history. >>> >>> I've used subversion for software version control and I am wondering if I >>> would be stretching it's features to versioning thousands of binary files >>> (currently 13,000 since the start of 2013) at about 60MB each file. >>> >>> Apart from the size of the diffs/deltas, I am struggling to envisage a way >>> to organise the repo. Making a new project for each submission would make >>> make the whole repo unwieldy. >>> >>> Has anyone used subversion for this type of tracking? Does what I'm >>> proposing sound feasible? Any thoughts would be appreciated. >> >> I don't believe there is a reasonable way to ever remove anything from >> a subversion repository such that it releases the space used for the >> thing you removed. So, I wouldn't consider this with subversion >> unless you can work out a way to make separate repositories for one or >> a few files so it would be feasible to just remove the whole thing if >> you no longer need it or 'svnadmin dump/filter/load' to restructure >> them. > > Separate repositories linked together by "svn;external" settings can > do this, with a central "build" structure publishing tags or branches > with hooks to specific releases of components from other repos. But > resource tracking can get awkward. Some old legacy repo that only one > project was using can wind up culled, with managerial approval, and > discovered to be critical to another legacy tool or two that no one > has built for a few years and kept saying "if it's not broken, don't > fix it". So factoring the repositories well, and having good archival > backups, can be invaluable.
You can simply put a bunch of repos under the top level served by http or svn and it appears pretty seamless except for when you have to create a new one. But, since binary diffs aren't very useful anyway and that migh have scaling issues, I think I'd just try to use a de-duping filesystem like zfs and store as many copies as might still be useful. -- Les Mikesell lesmikes...@gmail.com