[BackupPC-users] Trying to understand exactly how BackupPC works

Zach La Celle Mon, 30 Dec 2013 08:58:42 -0800

I'm reading through BackupPC documentation and trying to understand how
it functions.  I'm a little confused about two part:


1) Difference between full and incremental: full says it does all
files.  I guess this means that all files are pulled down, then checked
and pooled.  Incremental first checks modification, then only copies the
files that are different, then pools.  However, if incremental/full are
working correctly, they will (in the end) function semantically the
same, because all files which would have been copied by full that would
not have been copied by incremental would be pooled anyways, resulting
in nothing new written to disk other than hard links.  So, why do a full
backup ever after completing the first one?  Why not run 1 full, then
always do incrementals?

2) The order of operations is ping->dump->extract->link.  I'm trying to
understand the file compare/extract/link part.  The available transfer
methods are rsync, rsyncd, ftp, smb, and tar.  The documentation says
that incoming data is extracted to __TOPDIR__/pc/$host/new (if it's
compressed), with tarExtract checking the MD5 hash of files as they come
in.  It only checks the first N bytes, meaning that the hash can be
completed 100% in memory, before the file is written to disk.  However,
it then says "BackupPC_tarExtract and rsync can handle arbitrarily large
files and multiple candidate matching files without needing to write the
file to disk in the case of a match."  This I don't quite understand: if
the file is larger than my memory, it has to be stored while being
compared bit-by-bit with other MD5-matching files to do the full compare.

Basically, for the extract->compare part:
* The first N bytes of incoming files are written to memory, then the
md5 is performed against all existing files, then what?  If it doesn't
find a match, it's written to the pool, but if it does, it still has to
write the file (there could be md5 collisions), correct?  Does it store
a list of the MD5 collisions?  Because in the BackupPC_link
documentation, it says it has to check against all files again, since
there could have been some added by another link process.
* Rsync is special because it doesn't have to write a file to disk in
case of a match?  How does this work with link?
* The above is not true for ssh, ftp, rsyncd, etc?

Thank you!

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
BackupPC-users mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

[BackupPC-users] Trying to understand exactly how BackupPC works

Reply via email to