On 9/30/24 09:39, Default User wrote:
Hi!
On a thread at another mailing list, someone mentioned that they, each
day, alternate doing backups between two external usb drives. That got
me to thinking (which is always dangerous) . . .
I have a full backup on usb external drive A, "refreshed" daily using
rsnapshot. Then, every day, I use rsync to make usb external drive B an
"exact" copy of usb external drive A. It seemed to be a good idea,
since if drive A fails, I can immediately plug in drive B to replace
it, with no down time, and nothing lost.
But of course, any errors on drive A propagate daily to drive B.
So, is there a consensus on which would be better:
1) continue to "mirror" drive A to drive B?
or,
2) alternate backups daily between drives A and B?
I migrated my data to a dedicated ZFS file server several years ago, in
part due to advanced ZFS backup features -- snapshots, compression,
de-duplication, replication, etc.. I used FreeBSD, but Debian has ZFS
and should be able to do the same thing.
My live server has a ZFS pool with two striped mirrors of two 3 TB HDD's
each and a special mirror of two 180 GB SSD's:
2024-09-30 16:44:38 toor@f5 ~
# zpool iostat -v p5
capacity operations bandwidth
pool alloc free read write read write
------------------------------ ----- ----- ----- ----- ----- -----
p5 3.19T 2.39T 49 2 28.4M 69.2K
mirror-0 1.58T 1.14T 21 0 14.0M 10.7K
gpt/hdd1.eli - - 8 0 6.99M 5.35K
gpt/hdd2.eli - - 12 0 6.99M 5.35K
mirror-1 1.58T 1.13T 20 0 14.0M 10.4K
gpt/hdd3.eli - - 10 0 7.00M 5.20K
gpt/hdd4.eli - - 9 0 7.00M 5.20K
special - - - - - -
mirror-2 29.4G 120G 7 2 408K 48.1K
gpt/ssd1.eli - - 3 1 204K 24.1K
gpt/ssd2.eli - - 3 1 204K 24.1K
------------------------------ ----- ----- ----- ----- ----- -----
The 'special' SSD mirror stores metadata, which improves overall
performance.
I create ZFS filesystems for groups of data -- Samba users, CVS
repository, rsync(1) backups of various non-ZFS filesystems, raw disk
image backups, etc..
ZFS has various properties that you can tune for each filesystem. Here
is the filesystem for my Samba data:
2024-09-30 16:50:07 toor@f5 ~
# zfs get all p5/samba/dpchrist | sort | egrep 'NAME|inherited'
NAME PROPERTY VALUE SOURCE
p5/samba/dpchrist atime off
inherited from p5
p5/samba/dpchrist com.sun:auto-snapshot true
inherited from p5
p5/samba/dpchrist compression on
inherited from p5
p5/samba/dpchrist dedup verify
inherited from p5
p5/samba/dpchrist mountpoint /var/local/samba/dpchrist
inherited from p5/samba
p5/samba/dpchrist special_small_blocks 16K
inherited from p5
'atime' is off to eliminate metadata writes when files and directories
are read.
'com.sun:auto-snapshot' is true so that zfs-auto-snapshot(8) run via
crontab(1) will find this filesystem, take snapshots periodically
(daily, monthly, yearly), and manage (prune) those snapshots:
2024-09-30 16:54:00 toor@f5 ~
# crontab -l
9 3 * * * /usr/local/sbin/zfs-auto-snapshot -k d 40
21 3 1 * * /usr/local/sbin/zfs-auto-snapshot -k m 99
27 3 1 1 * /usr/local/sbin/zfs-auto-snapshot -k y 99
I currently have 96 snapshots (e.g. backups) of the above filesystem
going back three and a half years:
2024-09-30 16:59:48 dpchrist@f5 ~
$ ls -d /var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_[dmy]* | wc -l
96
2024-09-30 17:01:12 dpchrist@f5 ~
$ ls -dt /var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_[dmy]* |
tail -n 1
/var/local/samba/dpchrist/.zfs/snapshot/zfs-auto-snap_m-2020-03-01-00h21
'compression' is on so that compressible files are compressed. (The
default compression algorithm will skip files that are incompressible.)
'dedup' is on so that duplicate blocks are saved only once within the
pool. De-duplication metadata is stored on the pool 'special' SSD
mirror, which improves de-duplication performance.
'special_small_blocks' is set to 16K so that files of size 16 KiB and
smaller are stored on the pool 'special' SSD mirror, which improves
small file read and write performance.
I have a backup server with matching pool construction. I periodically
replicate live server snapshots to the backup server (via SSH pull and
pre-shared keys). I would like to automate this task.
Both servers have SATA HDD mobile rack bays:
https://www.startech.com/en-us/hdd/drw150satbk
I have a pair of 6 TB HDD's in corresponding mobile rack trays, one for
near-site backups and one off-site backups. Each HDD contains one ZFS
pool. I periodically insert the near-site HDD into the backup server
and replicate the live server snapshots to the removable HDD. I
periodically rotate the near-site HDD and the off-site HDD.
Be warned that ZFS has a non-trivial learning curve. I suggest the
Lucas books if you are interested:
https://mwl.io/nonfiction/os#fmzfs
https://mwl.io/nonfiction/os#fmaz
David