I wasn't really prepared to make this announcement so soon, but now seems like a good time to let the community know. I've been working on a new implementation of rdiff-backup since about a month ago when I dug into the current codebase and discovered its disappointing quality. While what I have right now is functional and works on simple cases, it does not cover the broad range of features currently offered by rdiff-backup. I could use some help in bringing it up to par if others are interested in the path I have taken. While I have used the current codebase for direction and inspiration, I have started with a clean slate for several reasons:
- An automated test suite makes adding new features and long-term maintenance much easier. Adding this to the current codebase is both hard and boring. One thing that makes it very hard to write tests for the current codebase is the widespread use of globals. My new implementation has been developed using TDD and minimal use of globals (e.g. for loggers and constants). - The current repository layout has a critical design flaw that causes performance degradation as a repository grows. Most difference information is stored in a single file tree (rdiff-backup-data/increments), that has a very similar structure to the mirror. The problem is that as files get added/deleted/changed the directories in the increments tree are always growing in size, meaning it takes longer and longer to list the contents of directories in the tree. This performance problem is negligible in small-to-medium sized backup sets, but becomes apparent in very large backup sets as the number of increments grows. I have redesigned the repository layout in my new implementation to eliminate this performance issue. Note that I do not know for sure if my new layout will completely eliminate this problem since I have not tested it yet with a very large backup set over a long period of time. - When the current version of rdiff-backup fails it often aborts completely, leaving the repository in a state that needs to be rolled back to the previous backup state in order to continue using it. While this is a good conservative approach, it potentially results in the loss of difference data that could otherwise be saved. I have designed my new version to recover better from errors--simply logging unexpected errors and skipping the current task rather than aborting completely. I also have plans to make it possible to retain incremental data from a failed backup rather than simply discarding it. - There is currently no (efficient) way to do a complete verification of all data in a repository. My new version was designed with this as a requirement. - Although it is not implemented yet, I have some ideas of how to make use of multiple cores to speed up rdiff-backup once the initial backup has been created. Backups after the first one (which is usually IO bound) are often CPU bound; using multiple cores could help to speed up backups. - Another thing that is planned, but not implemented yet is the ability to remove all traces of selected files from a backup repository. This should be a built-in feature of rdiff-backup since it is a common occurrence to have to remove files that were backed up by mistake. Currently it is only possible to do this by hand (very error prone) which I find unacceptable. - Did I mention that the new version has been developed from the ground up with full unicode support? Please note that this new version will obviously not be backward compatible with older rdiff-backup repositories. While a tool could conceivably be written to convert an old repository to the new format, I have no desire to do so, and I doubt that anyone else will either... I have developed this new version using git for version control, which I plan to continue using. I am hoping to put it up on github soon. ~ Daniel _______________________________________________ rdiff-backup-users mailing list at [email protected] http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
