Hi Guys, I have a strange problem with regards to failing backups running backuppc.
I'm backing up a samba fileshare on a linux box over a local lan to a linux box running backuppc. The fileshare in total is fairly large ( 933Gb in size) The backups are run using rsync. I used to backup the entire share directory as one , eg /share ( which totals 933Gb ) This didn't work as I would receive errors. I then split the /share directory into smaller "chunks", and backed it up as individual hosts . Eg /share/directory1 would be assigned as fileshare1 /share/directory2 would be assigned as fileshare2 The "individual hosts" would then run on their own to be backed up. I split the different sub-directories under the parent /share to be roughly as equal in size as possible without going down to deep into the directory structure, Eg /share/directory1 = fileshare1 /share/directory2 = fileshare2 /share/directory3 and /share/directory4 = fileshare3 I have a total of 5 "fileshare hosts" which run on their own as individual hosts, thus backing up the different sub-directories under /share. The sizes for the different fileshare's are as follows. fileshare1 = 276Gb fileshare2 = 84Gb fileshare3 = 252Gb fileshare4 = 338Gb fileshare5 = less than a Gb. The first backup run completed successfully, each "host" was backed up correctly. But since, I've been experiencing backup failures on fileshare4 only ( it happens to be the largest chunk ) Fileshare4 consists of the following two directories : /share/data/Samba-info : 95Mb /share/data/Shared_Services : 338Gb I have increased the ClientTimeout to = 360000 (4 days) As I understand it aborted by signal=PIPE could be related to either filesystem errors / network errors / memory issues. No filesystem errors in the logs at all. Both these machines have 2Gb of memory each, the fileshare machine being backed up has 2 Intel Xeon processors @2.66Ghz and the backup machine has Intel Core 2 duo CPU @ 2.33ghz With the backup running, it's not using swap, so the hardware side seems fine. I initially suspected network error's, after some searching it seems that there might be a issue with the e1000 card and TSO enabled on the card - I disabled TSO on both NIC's of both machine's, but still the same errors. And only on the largest fileshare. The backup would usually run for about 7.5 hours. The following is a Xfer error summary for fileshare4 ------------------------------------------------------- full backup started for directory /share/data/Samba-Info (baseline backup #20) Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /share/data/Samba-Info/ Xfer PIDs are now 2590 Got remote protocol 30 Negotiated protocol version 28 Xfer PIDs are now 2590,2673 [ skipped 5 lines ] Done: 4 files, 98952554 bytes full backup started for directory /share/data/Shared_Services (baseline backup #20) Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /share/data/Shared_Services/ Xfer PIDs are now 2817 Got remote protocol 30 Negotiated protocol version 28 Xfer PIDs are now 2817,2896 [ skipped 329963 lines ] Can't write 33792 bytes to socket Read EOF: Connection reset by peer Tried again: got 0 bytes Child is aborting Done: 304800 files, 350534594199 bytes Got fatal error during xfer (aborted by signal=PIPE) Backup aborted by user signal Saving this as a partial backup full backup started for directory /share/data/Samba-Info; updating partial #21 Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /share/data/Samba-Info/ Xfer PIDs are now 16009 Got remote protocol 30 Negotiated protocol version 28 Xfer PIDs are now 16009,16134 [ skipped 5 lines ] Done: 4 files, 98952554 bytes full backup started for directory /share/data/Shared_Services; updating partial #21 Running: /usr/bin/ssh -q -x -l root 172.16.0.6 /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group -D --links --hard-links --times --block-size=2048 --recursive --ignore-times . /share/data/Shared_Services/ Xfer PIDs are now 16139 Got remote protocol 30 Negotiated protocol version 28 Xfer PIDs are now 16139,16578 [ skipped 74785 lines ] Remote[2]: file has vanished: "/share/data/Shared_Services/IT/(Public)/General/Software/Audit2009_09/~$ftwareAudit_Sep09c.doc" [ skipped 255274 lines ] Can't write 33792 bytes to socket Read EOF: Connection reset by peer Child is aborting Done: 304885 files, 350634798658 bytes Got fatal error during xfer (aborted by signal=PIPE) Backup aborted by user signal Saving this as a partial backup ------------------------------------------------------- Any ideas / suggestions? Thanks. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list [email protected] List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
