Package: debian-reference
Version: 1.08-2
Severity: wishlist
Tags: patch

Greetings Osamu and everyone that works on debian-reference.
Thank-you for your wonderful package.  A paragraph in section 8.4
reads:

   Combination of one of these with the archiving method described in
   Copy and archive a whole subdirectory, Section 8.3 and the
   automated regular job described in Schedule activity (cron, at),
   Section 8.6.27 will make a nice backup system.

This document is about creating such a system and is meant for
inclusion in debian-reference.  It uses regular gnu tar rather than
pdumpfs or subversion.  I've looked at many of the 'backup utilities'
but these systems have thus far proved too complicated for me or not
flexible enough to fill my needs.  This work is partly inspired by
http://www.bluelavalamp.net/backerupper/ which did not work for me,
because when you are backing up 600 GB the last thing you are going to
be able to do is make a complete archive of everything *before*
scp'ing it somewhere, and the tar info page which has a great section
on using tar for backups.

This is still a bit of a work in progress, so feel free to ask for
changes.  This system is for people for whom all those CD/DVD based
systems would be too time consuming (and expensive) [600GB ~= 1000CDs]
and can't afford a tape based system.  But do have an old system lying
around and some extra hard disks (or could afford the one-time expense
of some new disks.)  :) I reserve the right to turn this entirely
into a doc about using a pre-canned backup system if I can find one
that fits my needs. :)  Maybe rsync...

I expect this information to fit in under the current 8.3 and 8.4
sections.  I am writing this specifically for debian-reference so it
is naturally available under the GPL, or whatever we need it to be
under.  Change what you will to make it work for debian-reference.  I
want to be a contributor, I want to help out.

This document has been created using outlines in emacs.  Consequently
the headings can be folded and more easily managed.




* moving files

[8.3 additions   First some stuff that could be added to section 8.3 .
Note that I use `tar c .` instead of `tar cf - .` as gnu tar uses
stdout by default.]

Sometimes when dealing with special files, permission, date/time
stamps, and ownerships; cp and scp will not work properly.  It's times
like that you can turn to tar as the most robust way of moving files
from place to place.

Copy a filesystem from one part of the tree to another.
$ cd /fromdir && tar c . | (cd /todir && tar x)

Or to a different machine:
$ tar c . | ssh muggles "cd /todir && tar x"

[See section 9.5 for more information on ssh, and especially 9.5.3 as
many of these commands will fail unless unless ssh is able to
authenticate without user interaction.]

Or from a different machine:
$ ssh chaljin "cd /fromdir && tar c ." | tar x

Tar can also be used to confirm that two subdirectories are identical:
$ cd /fromdir; tar c . | (cd /todir && tar d)


* pipes

[But here I start diverging from 8.3 by talking about dd and pipes.
Really, I'm giving background for the machinery in the next section.
I don't think there is a section on pipes yet so: ]

 [ 8.3&1/2 Using pipes ]

A dd conduit between systems:
$ ssh chaljin "dd if=bin/iSilo386 bs=10k" | dd of=iSilo

receive a tar file from another system
$ ssh chaljin "tar c bin" | dd bs=10k of=test.tar

from one tar to another
$ ssh chaljin "tar c bin" | tar x

A dd conduit between two forien systems:
$ ssh chaljin "dd if=bin/iSilo386 bs=10k" | ssh muggles "dd of=iSilo"

Transfer the contents of the bin directory between two foriens.
The pipe goes through the local system so it's not efficient for
big backups, this is only to demonstrate its flexibility.
$ ssh chaljin "cd bin; tar c ." | ssh muggles "cd v; tar x"


A pipe is very flexible.  You can pick any From Side for the pipe and
combine it with any To Side maximizing your options.


** From Side Examples:

A file/device on the local system.
  dd if=/my/file.tar bs=10k |

Or.
  cat file.tar |

A file/device on a forien system.
  ssh chaljin "dd if=bin/iSilo386.exe bs=10k" |

The output of tar.
  tar c . |

The output of tar on a forien system in a different directory.
  ssh chaljin "cd /from/dir; tar c ." |

Real world example:
  ssh [EMAIL PROTECTED] "cd /etc/cron;
      tar -c \
      -g snapshot \
      -X exclude \
      / | dd obs=20KiB"  |


** To Side Examples:

A local file/device.
  | dd of=/my/newfile

Append to a forien file.
  | ssh muggles "dd >> /myforien/newfile"

Unpack a tarball on a forien system in a different directory.
  | ssh muggles "cd /to/dir; tar x"

Real world example:
  | dd of=/home/backerupper/chaljin/$(date +%Y-%m-%d).tar"


* Tutorial: Running it all from cron

[8.4 additions]

People often tell you to make backups of you system.  This document
details /how/ to back up and restore your systems with minimal fuss.

** Justification

[I don't recommend including the justification in debian-reference.
It's too wordy and I don't think anyone will care.]

As a slight side track, or snake's hands, I will attempt to justify
letting the backerupper machine have full root access to the clients
being backed up.  Some argue that this is a security bug and want each
client to control what is backed, how, and when.  Doing so adds
considerable complexity to the scheme.  Each new client needs to be
configured and if the client has a different operating system the
entire scheme must be ported or rewritten to the new OS, where as
doing it all on the server eliminates the need for most of these
things.  The point of the opposition in not giving the backerupper
root level access is to prevent a security compromise to backerupper
from propagating to all the clients.  Unfortunately, not explicitly
giving backerupper explicit root access does not prevent anyone who
has root access on backerupper from gaining root access to all the
clients.  If backerupper is your secure logger and offers no access or
services the additional risk is reduced.  Perhaps it is even better
than allowing each client file access to backerupper and trying to
avoid the possibility of overwriting attacks.  Only by encrypting the
data on  backerupper might the tables be turned on this argument.

1. Essential client security files are on backerupper.  If /etc/passwd
   and /etc/shadow are delivered to backerupper it is possible to
   extract passwords using a tool such as 'john the ripper'.

2. Once backerupper is compromised any restores done from it could
   introduce trojans or back doors, there by compromising the client.

I believe that using a secure logger and backerupper is in fact more
secure than using an intelligent client model, but I am not a security
professional and offer this as merely my opinion so naturally I can
guarantee noting.  If you consider the sensitivity of the clients
with respect to the backerupper under any model the backerupper is
both the more sensitive and the more easy to secure.

** Background

There are many backup systems available.  Unfortunately, I'm very
simple and they were either too complex for me to fathom or not
flexible enough for my needs.  If you have a data set containing
hundreds of gigabytes or complex database driven services then this
backup method could be for you.  It uses commonly available tools and
hardware, mainly GNU tar and a spare computer.  This method doesn't
create duplicates of the data before archiving so it works when
backing up a terabyte raid that only has 10 MB of free space.

A backup archive can be stored anywhere but I suggest using the hard
disk of a separate machine as disk space is both reusable and
inexpensive.  Using a separate system adds an additional level of
security protecting your data.  I use only one machine that functions
as a backup system to all my other computers.  This backup system
makes a good secure logger [1] if you need such a thing.

The directories on a Debian system to backup are /var /home /etc /root
and possibly /opt and /usr/local or any other filesystem you hand
tweaked, for instance /boot if you are using a custom kernel, or
/usr/src if you do custom build work there.  As long as you have
followed the Filesystem Hierarchy Standard [2] this list will be quite
short and prevent you from having to back up any system managed
executables or files.

If you would like to ease bare metal recovery save some additional
information about the target system such as partition information,
installed hardware, and needed drivers; especially those for drive
system and network access; in the archive data directory.


** Requirements and Procedure

The client system needs to support ssh, scp, and have a copy of gnu
tar installed.  Everything else is handled by the backup system.
The backup system will need cron, tar, ssh, and scp.

In our example, the backup machine is named 'muggles' and the server
being backed up is named 'chaljin'.  Create a 'backerupper' user on
the backup machine and generate a ssh key for him.  Put his public key
in root's authorized-keys file on the machine to be backed up.  [See
section 9.5.3]  This makes him equivalent to root on the target
system.  He will need this access to save and restore your critical
data.

Add these cron lines to /etc/crontab of your backup system.  This zero
the snapshot file for the target machine, which triggers a full
backup, and run the backup script.  Here a full backup is done
quarterly and incremental backups are done weekly.

#m  h dom month dow user        command
03  1  1    */3  *  backerupper echo "" > /home/backerupper/chaljin/snapshot
04  1  *    *    0  backerupper /home/backerupper/chaljin/script.sh

There are several files needed for this system to operate.

  script.sh
  snapshot (generated by tar)
  exclude

muggles $ cat /home/backerupper/chaljin/script.sh
#!/bin/bash

# Save the debconf database
ssh [EMAIL PROTECTED] "debconf-get-selections > /var/backups/debconf-selections"

# Save the package selections
ssh [EMAIL PROTECTED] "dpkg --get-selections "*" > /var/backups/dpkg-selections"

# This system runs some postgresql databases so we save those to a stable
# state.

# Full and incremental backup of everything except /proc /dev /mnt
# /cdrom and /floppy (as specified in /etc/cron/exclude) to the remote
# computer backerupper.  I typically don't use compression as much of
# the stuff in my archives don't compress well, and gzip doesn't deal
# very well with already compressed data. [Running gzip on a
# compressed files is really slow and the file just gets larger.]
# Also, if you have large data, such as photographs, embedded in a
# postgres database pg_dumpall will not preserve it.  In that case you
# will need to add pg_dump commands for the individual databases.

BACKUP=$(date +%Y-%m-%d)
ssh [EMAIL PROTECTED] "su - postgres; pg_dumpall > /var/backup/pgdb.dump
if [ -f /home/backerupper/chaljin/snapshot ];
then scp /home/backerupper/chaljin/snapshot [EMAIL 
PROTECTED]:/var/backup/snapshot;
fi
scp /home/backerupper/chaljin/exclude  [EMAIL PROTECTED]:/var/backup/exclude
ssh [EMAIL PROTECTED] "cd /etc/cron;
      tar -c \
      -g /var/backup/snapshot \
#     --gzip \
#     -X /var/backup/exclude \
      /etc /root /home /var /opt /usr/local"  \
   | dd of=/home/backerupper/chaljin/$BACKUP.tar"
scp [EMAIL PROTECTED]:/var/backup/snapshot /home/backerupper/chaljin/snapshot
tar tvf /home/backerupper/chaljin/$BACKUP.tar \
   > /home/backerupper/chaljin/$BACKUP.listing
muggles $


[Side note to Osamu:  You might want to add a blurb about
'debconf-get-selections' and 'debconf-set-selections' to 6.4.9]

[I've had problems with the blocking factor (-b) in tar, which seems
to have no impact when I use it.  `tar c /home | dd obs=10KiB | dd
bs=10KiB count=100` seems to indicate that the blocking tar uses (regardless
of the -b$NUM option) is 512 bytes.]

In the storage directory I have /home/backerupper/chaljin/exclude .
This is copied to the client being backed up, in our example: chaljin,
just before tar will need it.  Any file matching one of the patterns
listed in exclude will not make it into the archive.  See tar's info
pages for details.  In this example we don't want to attempt backing
up the live postgres databases as it will be unsucessful, doing so is
handled by the pg_dumpall command in our script.sh .  Similar
adjustments need to be made for email, news, and other database driven
services.

muggles $ cat /home/backerupper/chaljin/exclude
/home/archive/*
/home/share/*
/var/lib/postgres/data/
muggles $


To be able to do a bare metal restore we'll need to know any needed
network drivers, machine name, partition layout, logical volume
management, booting strap system (such as grub or yaboot on powerpc).
Right now you should note this kind of information in a file
( muggles:/home/backerupper/chaljin/Notes.txt ) on the backup
system.


** Full Restore

  To reconstruct the system after catastrophic failure install a base
  debian system and then restore the rest from your backups.

  1. Reinstall Debian

  2. Restore information from backup machine.

tar extractions done by root preserve the owner and the permissions
of files:

     muggles $ cd /home/backerupper/chaljin/
     muggles $ dd < full.tar | ssh [EMAIL PROTECTED] "cd /; tar -x"
     muggles $ dd < inc1.tar | ssh [EMAIL PROTECTED] "cd /; tar -x"
     muggles $ dd < inc2.tar | ssh [EMAIL PROTECTED] "cd /; tar -x"
     muggles $ ssh [EMAIL PROTECTED]
     chaljin # apt-get update
     chaljin # dpkg --set-selections </var/backup/dpkg-selections
     chaljin # debconf-set-selections </var/backup/debconf-selections
     chaljin # su - postgres
     chaljin $ psql -f /var/backup/pgdb.dump template1; exit

At this point you might need to iron out any changes that have occured
in Debian since you last updated your server.  Running 'aptitude' or
'dselect' at this point might be wise.  Also if Debian were perfect
the packaging system would be able to deal with administrator modified
configuration files, as it is you will need to iron out the conflicts
just as one must when preforming any normal upgrade.  If you are restoring
a powerpc based Debian check that your new partitions match what you have
in /etc/yaboot.conf before running ybin, which you might need to wait
until after you finish the upgrade to do.

     chaljin # apt-get upgrade



** Partial Restore

  From our archive listing
  (/home/backerupper/chaljin/2005-02-06.listing) we learn which files
  we want to restore.  For me this file is greater than 50 MB, only
  use an editor such as vim which is capable of dealing with such a
  large text file.  Emacs lags at bit but can handle it, do not try
  this in MSWord unless you wanted to reboot your machine anyway or
  can afford to wait for a long time.  If you only want to restore a
  few files I suggest extracting the files from the archive on the
  backup system and copying them by hand to the target.

  muggles $ cd /home/backerupper/chaljin/
  muggles $ mkdir scratch; cd scratch
  muggles $ tar xf ../2005-02-06.tar /etc/netatalk /home/frodo/important.txt
  muggles $ scp -R * [EMAIL PROTECTED]:/

  Or, if you need to preserve timestamps & permissions:

  muggles $ tar c . | ssh [EMAIL PROTECTED] "cd /; tar x"

  If you have a moderately large set you may be better off telling tar
  which files you want by listing them in a file.  This example will
  extract all files who's complete path name matches one of the
  patterns in 'list'.

  muggles $ cat list
  *thesis*
  home/cira/blue
  home/cami/red
  muggles $ tar xf 2005-02-06.tar -T list


** Looking up old data without actually restoring it.

  At this point I use tar to search for and pull out files I'm
  interested in.  It's slow operating on 20-120 GB archives but I just
  live with it at this point.  One possibility here might be to
  extract the archive into some kind of a compressed file system.
  Perhaps cloop, see: the cloop-utils package.  Or squashfs-tools.
  File-roller gives you a nice gui to look through the tar balls with,
  but again it's slow working on large tar files.  For me, it took
  more than 60 minutes to open one of my 120GB full dumps in
  file-roller.


* Notes

[I've tried to make these commands as simple as possible by
presuming the gnu utilities of debian rather than trying to
make them work with other versions of the programs.]

[1] A secure logger runs no services, offers no open ports and is
    connected to the system it is watching by a dedicated network
    port.  Basically, there is no access except physical access.  Now
    it's up to you to provide the physical security.
    http://www.tldp.org/HOWTO/Security-HOWTO/secure-prep.html#logs

[2] FHS, Filesystem Hierarchy Standard http://www.pathname.com/fhs/
    is the document which explains what goes where and why in a
    unix filesystem.




-- System Information:
Debian Release: 3.1
  APT prefers unstable
  APT policy: (500, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.10-1-686
Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968)

Versions of packages debian-reference depends on:
ii  debian-reference-en           1.08-2     Debian system administration guide

-- no debconf information

-- 
   .''`.          /\/'`\         [EMAIL PROTECTED]
  : :' :        .::/:::::..    .  irc://fslc.usu.edu/#cira
  `. `'    ) .//::(:###( )::.._/^  gps:41°45'N 111°49'W
    `- ..:@://"  ,|)   _/.        gpg:1024D/A7AAF777


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to