Hello, I am trying to identify a reasonable version control system for an unusual workflow. As SVN is a major player in this space, it is one of the systems that I want to consider but I've run into some problems. It is unclear to me whether my problems are specific to the SVN clients I have tested or whether they are a general consequence of the way SVN has been designed.
I would appreciate feedback on whether there are ways to make SVN work more effectively for my project, or in the alternative whether there are other version control systems that might be more suitable. Workflow Specifications: * ~1 million files under version control ( > 99% are essentially text files, but a few are binary ) * Average file size 50 kB, for a total archive of 50 GB. * Wide range of sizes, ~50% of files are less than 10 kB, but a couple are greater than 1 GB. * Most updates occur through a batch process that changes ~10% of files every two weeks. (Not the same 10% every time.) * Typically batch changes modify only a few percent of each file, so total difference during batch update is only ~200 MB. Other Requirements: * Must support random file / version access. * Clients must run on Windows and Linux / Mac * Must allow for web based repository viewing. * Highly desirable to allow for partial checkout of subdirectories. In my testing, SVN clients seem to behave badly when you throw very large numbers of files at them. TortoiseSVN, for example, can take hours for a relatively simple add operation on data samples a fraction of the total intended size. Another of the SVN clients I tested (but won't bother naming) crashed outright when asked to work with 30000 files. Are there ways to use SVN in conjunction with very large data sets that would improve its performance? For example alternative clients that might be better optimized for this workflow? I'd even consider recompiling a client if there was a simple way to find significant improvements. My worry is that SVN may be designed in such a way that it is always going to perform poorly on a data set like mine. For example, by requiring lots of additional file i/o to maintain all its records. Is that the case? If so, I would appreciate any recommendations for other version control systems that might be better tailored to working with very large data sets. Thank you for your assistance. -Robert A. Rohde