On 11/5/21 13:43, [email protected] wrote:
We're using the GPFS filesystem, and doing filesystem snapshots every
15 minutes, with a limited set retained for at least 2 months. The
snapshots allow for almost instant restores of recent data and comparison
between different versions of files, without system administrator
intervention.
Because of snapshots, I'm planning to eliminate all nightly incremental
& differential backups to tape. Tape backups would be only for
archival/disaster-recovery purposes and for compliance with grant and
data management requirements.
I'm not sure this is wise. Remember that if you lose an array, you lose
all of its snapshots too. Or would you consider that a
disaster-recovery scenario?
The new strategy would be to do a full backup every 2 months, kept for
5 months. One backup would be kept for at least 2 years, the others would
be rotated (media reused). For example:
January 2021 keep until January 2023
March 2021 keep until August 2021
May 2021 keep until October 2021
July 2021 keep until December 2021
September 2021 re-use March 2021 media, keep until February
2022
November 2021 re-use May 2021 media, keep until April 2022
January 2022 keep until January 2024
This is going to be complex. I think it's DOABLE, but you will need a
complex set of Pools and Schedules because of the way you're setting up
multiple rentention times for backups of the same jobs at the same levels.
What you might need to do is run all of your Full backups to one Pool
that has five months retention, then every six months run a Copy job
that archives the most recent set of Full backups to a second Pool with
two years retention. This is probably the method that will result in
the least tearing out of hair.
All tape backups would be done from a snapshot, so that no files within
the source of the backup change during the process. A "run before job"
script would dump coherent copies of databases, then create a filesystem
snapshot dedicated to the backup. That snapshot would be removed when
the backup is complete.
We've got about 700 top-level directories for user accounts and research
projects. We'll probably run an individual backup job for each group of
directories alphabetically (A*, B*, etc), so that the 400TB will be spread
(unevenly) across about 45 Bacula jobs.
This MOSTLY seems sound, with the proviso that I am not familiar with
the details of GPFS. But I've implemented similar schemes on top of
Solaris' ZFS.
The scheme of backing up snapshots is sound and a good plan. It
entirely sidesteps the problem of files being changed while they are
being backed up. Does GPFS offer you a way to create incremental
snapshots containing the changes since a stipulated previous snapshot?
That might be a way to get viable intermediate incremental or
differential backups.
--
Phil Stracchino
Babylon Communications
[email protected]
[email protected]
Landline: +1.603.293.8485
Mobile: +1.603.998.6958
_______________________________________________
Bacula-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-users