Hello all,
Please consider this patch for inclusion to the next major rdiff-backup
release. This is my first patch submission to rdiff-backup, so please
offer constructive comments if this patch needs adjusted.
Many people have been interested in sparse file support, and so have
I---so here's the patch!
+ Blocks that are "Globals.blocksize"-length of all \x00's are made
sparse automatically. (Globals.blocksize is currently 128k)
+ This feature has been requested a few times, and I have added
documentation to the SparseFiles wiki page:
http://wiki.rdiff-backup.org/wiki/index.php/SparseFiles
+ Any filesystem that can f.seek() beyond EOF and f.write() to generate
sparse files is supported.
+ This works for both local-copy and remote backups.
+ Works in conjunction with BlockFuse to backup sparse LVM snapshots:
http://www.globallinuxsecurity.pro/blog.php?q=rdiff-backup-lvm-snapshot
+ I am using this patch in production with ~600GB sparse LVM snapshots.
+ Backups of sparse files can be 2x faster since the filesystem returns
a zero-filled buffer on f.read()---rather than hitting the disk and
causing unnecessary IO.
Feedback and comments are appreciated!
Cheers,
--
Eric Wheeler
President
eWheeler, Inc.
dba Global Linux Security
www.GlobalLinuxSecurity.pro
503-330-4277
PO Box 14707
Portland, OR 97293
Index: rpath.py
===================================================================
RCS file: /sources/rdiff-backup/rdiff-backup/rdiff_backup/rpath.py,v
retrieving revision 1.142
diff -u -r1.142 rpath.py
--- rpath.py 23 Jun 2009 23:56:30 -0000 1.142
+++ rpath.py 3 Jan 2011 03:27:04 -0000
@@ -58,10 +58,44 @@
def copyfileobj(inputfp, outputfp):
"""Copies file inputfp to outputfp in blocksize intervals"""
blocksize = Globals.blocksize
+
+ sparse = False
+ buf = None
while 1:
inbuf = inputfp.read(blocksize)
if not inbuf: break
- outputfp.write(inbuf)
+
+ if not buf:
+ buf = inbuf
+ else:
+ buf += inbuf
+
+ # Combine "short" reads
+ if (len(buf) < blocksize):
+ continue
+
+ buflen = len(buf)
+ if buf == "\x00" * buflen:
+ outputfp.seek(buflen, os.SEEK_CUR)
+ buf = None
+ # flag sparse=True, that we seek()ed, but have not written yet
+ # The filesize is wrong until we write
+ sparse = True
+ else:
+ outputfp.write(buf)
+ buf = None
+
+ # We wrote, so clear sparse.
+ sparse = False
+
+
+ if buf:
+ outputfp.write(buf)
+ buf = None
+
+ elif sparse:
+ outputfp.seek(-1, os.SEEK_CUR)
+ outputfp.write("\x00")
def cmpfileobj(fp1, fp2):
"""True if file objects fp1 and fp2 contain same data"""
_______________________________________________
rdiff-backup-users mailing list at [email protected]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki