Re: My Backup Script

Andy Canfield Tue, 26 Jul 2011 15:15:08 -0700


On 07/27/2011 01:34 AM, Nico Kadel-Garcia wrote:

On Tue, Jul 26, 2011 at 2:33 AM, Andy Canfield<andy.canfi...@pimco.mobi>  wrote:

For your information, this is my backup script. It produces a zip file that
can be tranported to another computer. The zip file unpacks into a
repository collection, giving, for each repository, a hotcopy of the
repository and a dump of the repository. The hotcopy can be reloaded on a
computer with the same characteristics as the original server; the dumps can
be loaded onto a different computer. Comments are welcome.

Andy, can we love you to pieces for giving us a new admin to educate
in subtleties?

Sure! I'm good at being ignorant. FYI I have a BS in Computer Scienceabout 1970 and an MS in Operations Research in 1972, worked in SiliconValley until I moved to Thailand in 1990. So although I am not stupid, Ican be very ignorant.

And also the IT environment here is quite different. For example, MySQLcan sync databases if you've got a 100Mbps link. Ha ha. I invented a wayto sync two MySQL databases hourly over an unreliable link that ran atabout modem speeds. I can remember making a driver climb a flagpole tomake a cell phone call because the signal didn't reach the ground. Tothis day we run portable computers out in the field and communicate viafloppynet. In this region hardware costs more than people, and softwareoften costs nothing.

#! /bin/bash

# requires root access
if [ ! `whoami` == root ]
then
    sudo $0
    exit
fi

# controlling parameters
SRCE=/data/svn
ls -ld $SRCE
DEST=/data/svnbackup
APACHE_USER=www-data
APACHE_GROUP=www-data

Unless the repository is only readable owned by root, this should
*NOT* run as root. Seriously. Never do things as the root user that
you don't have to. If the repository owner is "svn" or "www-data" as
you've described previously, execute this as the relevant repository
owner.

There are reasonable justifications for running it as root:

[1] Other maintenance scripts must be run as root, and this puts allmaintenance in a central pool. My maintenance scripts are crontab jobsof the form /root/bin/TaskName.job which runs /root/bin/TaskName.sh andpipes all stderr and stdout to /root/TaskName.out. Thus I can skim/root/*.out and have all the job status information at my fingertips.[2] For some tasks, /root/bin/TaskName.job is also responsible forappending /root/TaskName.out to /root/TaskName.all so that I can seeearlier outputs. There is a job that erases /root/*.all the first ofevery month.[3] I have heard for a long time never run GUI as root. None of thesemaintenance scripts are GUI.[4] There are many failure modes that will only arise if it is run asnon-root. For example, if run as root, the command "rm -rf/data/svnbackup" will absolutely, for sure, get rid of any existing/data/svnbackup that exists, whoever it is owned by, whatever junk isinside it.

# Construct a new empty SVNParent repository collection
rm -rf $DEST
mkdir $DEST
chown $APACHE_USER $DEST
chgrp $APACHE_GROUP $DEST
chmod 0700 $DEST
ls -ld $DEST

And do..... what? You've not actually confirmed that this has succeded
unless you do something if these bits fail.

Many of your comments seem to imply that this script has not beentested. Of course it's been tested already, and in any productionenvironment it will be tested again. And if stdout and stderr are pipedto /root/SVNBackup.out then I can check that output text reasonablyoften and see that it is still running. In this case I would check itdaily for a week, weekly for a month or two, yearly forever, and everytime somebody creates a new repository.

Also, by the standards of this part of the world, losing a day's work isnot a catastrophe. Most people can remember what they did, and do itagain, and it probably only takes a half-day to redo.

# Get all the names of all the repositories
# (Also gets names of any other entry in the SVNParent directory)
cd $SRCE
ls -d1 *>/tmp/SVNBackup.tmp

And *HERE* is where you start becoming a dead man id mkdir $DEST
failed. I believe that it works in your current environment, but if
the parent of $DEST does not exist, you're now officially in deep
danger executing these operations in whatever directory the script was
run from.

As noted above, $DEST is /data/svnbackup. The parent of $DEST is /data./data is a partition on the server. If that partition is gone, that's afailure that we're talking about recovering from.

# Process each repository
for REPO in `cat /tmp/SVNBackup.tmp`

And again you're in trouble. If any of the repositories have
whitespace in their names, or funky EOL characters, the individual
words will be parsed as individual arguments.

This is Linux. Anyone who creates a repository with white space in thename gets shot.

do
    # some things are not repositories; ignore them
    if [ -d $SRCE/$REPO ]

Here is a likely bug in the script. I treat every subdirectory of theSVNParent repository collection as if it were a repository. But it mightnot be. There might be valid reasons for having a different type ofsubdirectory in there. Probably this line should read something like

    if [ -d $SRCE/$REPO/hooks ]
    then
        .. backup the repository..
    else
        ... just copy it over ...
    endif

    then
        # back up this repository
        echo "Backing up $REPO"
        # use hotcopy to get an exact copy
        # that can be reloaded onto the same system
        svnadmin  hotcopy  $SRCE/$REPO   $DEST/$REPO
        # use dump to get an inexact copy
        # that can be reloaded anywhere
        svnadmin  dump     $SRCE/$REPO>$DEST/$REPO.dump
    fi
done

See above. You're not reporting failures, in case the repository is
not of a compatible Subversion release as the current "svnadmin"
command. (This has happened to me when someoone copied a repository to
a server with older Subversion.)

Yes. But then the failure was on the setting up the repository, not onbacking it up. Perhaps I should run

*    svnadmin verify $SRCE/$REPO*

first and take appropriate action if it fails. Oh, please don't tell methat 'svnadmin verify' doesn't really verify completely!

On another point, "reporting failures" ought to mean "sending e-mail tothe sysadmin telling him that it failed. I've been trying to do that foryears and cannot. I can not send e-mail to an arbitrary target e-mailaddress u...@example.com from a Linux shell script.* Most require 'sendmail', notoriously the hardest program on the planetto configure.* I found that installing 'sendmail', and not configuring it at all,prevented apache from starting at boot time. Probably something wrongwith init.* Much of the documentation on sendmail only covers sending e-mail to anaccount on that server computer, not to u...@example.com elsewhere inthe world. As if servers were timesharing systems.* Sendmail has to talk to an SMTP server. In the past couple of years itseems as if all the SMTP servers in the world have been linked into anauthorization environment to prevent spam. So you can't just run yourown SMTP - it's not certificated.* Thunderbird knows how to log in to an SMTP server; last time I lookedsendmail did not.

Without e-mail, any notification system requires my contacting themachine, rather than the machine contacting me. And that is unreliable.

# Show the contents
echo "Contents of the backup:"
ls -ld $DEST/*

This is for /root/SVNBackup.out. It lists the repositories that havebeen included in the backup.


Indeed, the above line that reads
    echo "Backing up $REPO"

only exists because hotcopy outputs progress info. I tried "--quiet" andit didn't shut up. Maybe "-q" works.

# zip up the result
cd $DEST
zip -r -q -y $DEST.zip .

Don't use zip for this. zip is not installed by default on a lot of
UNIX and Linux systems, tar and gzip are, and give better compression.
Just about every uncompression suite in the world supports .tgz files
as gzipped tarfiles, so it's a lot more portable.

The 'zip' program is installable on every computer I've ever known. And,at least until recently, there were LOTS of operating systems that didnot support .tar.gz or .tar.bz2 or the like. IMHO a zipped file is a lotmore effectively portable. And the compression ratio is close enoughthat I'm willing to get 15% less compression for the portability.

Also, the script has ignored the problems of symlinks. You may not use
them, but a stack of people use symlinked files to pre-commit scripts,
password files, or other tools among various repositories from an
outside source. If you haven't at least searched for and reported
symlinks, you've got little chance of properly replicating them for
use elsewhere.

My guess is that there are two types of symlinks; those that pointinside the repository and those that point outside the repository. Thosethat point inside the repository should be no problem. Those that pointoutside the repository are bad because there is no guarantee that thething pointed to exists on any given machine that you use.

And AFAIK svnadmin and svndump preserve symlinks as such, and that isthe best that I can do in either case.

Also, this is the kind of thing where you backup the symlink and later,if we must restore, some human being says "What does this symlink pointto?"

# Talk to the user
echo "Backup is in file $DEST.zip:"
ls -ld $DEST.zip

It looks like you're relying on "ls -ld"

Again, this is a more-or-less standard part for the purpose of puttinginformation into the /root/SVNBackup.out file. All of my backup scriptsdo this. Sometimes I look and say to myself "Why did the backup suddenlytriple in size?" and dig around and discover that some subdirectory wasadded that should not have be present.

# The file $DEST.zip can now be transported to another computer.

And for a big repository, this is *grossly* inefficient. Transmitting
bulky compressing files means that you have to send the whole thing in
one bundle, or incorporate wrappers to split it into manageable
chunks. This gets awkward as your Subversion repositories grow, and
they *will* grow because Subversion really discourages discarding
*anything* from the repositories.

A backup file that is created on an attached portable disk does not needto be transported.

A backup file that is transmitted over a LAN once a day is not too big,no matter how big; 3 hours is a reasonable time frame for transport.

Historically I ran a crontab job every morning at 10AM that copied abackup file to a particular workstation on the LAN. By 10AM thatworkstation is turned on, and if it slows down, well, the lady who usesit is not technically minded enough to figure out WHY it's slowing down.And it was only a few megabytes.

Yeah, a zip of the entire SVNParent repository collection might be atoo big to send over the Internet.

Oh yes, one more thing. Using svnadmin in various ways it is possible topurge old revisions from a repository. I would expect that we do thatperiodically, maybe once a year. If we're putting out version 5.0 ofsomething, version 3.0 should not be in the repository, it should be inan archive.

  I'd very strongly urge you to revew
the use of "svnsync" to mirror the content of the repository to
another server on another system, coupled with a wrapper to get any
non-database components separately. This also reduces "churn" on your
drives, and can be so much faster that you can safely run it every 10
minutes for a separate read-only mirror site, a ViewVC or Fisheye
viewable repository, or publication of externally accessible
downloadable source.

I shy away from svnsync right now because it requires me to get TWO ofthese Subversion systems running. At present I am almost able to get onerunning. Almost.

As harsh as I'm being, Andy, it's actually not bad for a first shot by
someone who hasn't been steeped in the pain of writing industry grade
code like some of us. For a one-off in a very simple environment, it's
fine, something to get this weeks' backups done while you think about
a more thorough tool, it's reasonable, except for the big booby trap
David Chapman pointed out about using the hotcopies, not the active
repositories, for zipping up.

Thank you. I think the key phrase here is "a very simple environment".How much do we pay for a server? 400 dollars. One guy recommended buyinga server for 4,000 dollars and he was darned near fired for it.

I fixed the booby trap already. Your comments will lead to some otherchanges. But not, for now, a second computer.


OH! I thought of something else!

Suppose we do a backup every night at midnight, copying it to a safeplace. And suppose that the server dies at 8PM Tuesday evening. Then allsubmits that occurred on Tuesday have been lost. Presumably we'd findout about this on Wednesday.

But a working copy is a valid working copy until you delete it. Assumingthat the working copies still exist, all we need to do is* Restore the working SVNParent repository collection on a replacementcomputer.

* Have everyone 'svn commit' from their working copies.
* Unscramble the merge problems, which should be few.

This becomes feasible if nobody deletes their working copy until 48hours after their last commit. And my guess is that people will do thatnaturally. People who are working on the package will keep one workingcopy indefinitely, updating it but not checking out a whole new one.People who do only brief work on the package will not purge the workingcopy until they start worrying about disk space.


Thank you very much.

Re: My Backup Script

Reply via email to