On 07/27/2011 01:34 AM, Nico Kadel-Garcia wrote:
On Tue, Jul 26, 2011 at 2:33 AM, Andy Canfield<andy.canfi...@pimco.mobi>  wrote:
For your information, this is my backup script. It produces a zip file that
can be tranported to another computer. The zip file unpacks into a
repository collection, giving, for each repository, a hotcopy of the
repository and a dump of the repository. The hotcopy can be reloaded on a
computer with the same characteristics as the original server; the dumps can
be loaded onto a different computer. Comments are welcome.
Andy, can we love you to pieces for giving us a new admin to educate
in subtleties?
Sure! I'm good at being ignorant. FYI I have a BS in Computer Science about 1970 and an MS in Operations Research in 1972, worked in Silicon Valley until I moved to Thailand in 1990. So although I am not stupid, I can be very ignorant.

And also the IT environment here is quite different. For example, MySQL can sync databases if you've got a 100Mbps link. Ha ha. I invented a way to sync two MySQL databases hourly over an unreliable link that ran at about modem speeds. I can remember making a driver climb a flagpole to make a cell phone call because the signal didn't reach the ground. To this day we run portable computers out in the field and communicate via floppynet. In this region hardware costs more than people, and software often costs nothing.

#! /bin/bash

# requires root access
if [ ! `whoami` == root ]
then
    sudo $0
    exit
fi

# controlling parameters
SRCE=/data/svn
ls -ld $SRCE
DEST=/data/svnbackup
APACHE_USER=www-data
APACHE_GROUP=www-data
Unless the repository is only readable owned by root, this should
*NOT* run as root. Seriously. Never do things as the root user that
you don't have to. If the repository owner is "svn" or "www-data" as
you've described previously, execute this as the relevant repository
owner.
There are reasonable justifications for running it as root:
[1] Other maintenance scripts must be run as root, and this puts all maintenance in a central pool. My maintenance scripts are crontab jobs of the form /root/bin/TaskName.job which runs /root/bin/TaskName.sh and pipes all stderr and stdout to /root/TaskName.out. Thus I can skim /root/*.out and have all the job status information at my fingertips. [2] For some tasks, /root/bin/TaskName.job is also responsible for appending /root/TaskName.out to /root/TaskName.all so that I can see earlier outputs. There is a job that erases /root/*.all the first of every month. [3] I have heard for a long time never run GUI as root. None of these maintenance scripts are GUI. [4] There are many failure modes that will only arise if it is run as non-root. For example, if run as root, the command "rm -rf /data/svnbackup" will absolutely, for sure, get rid of any existing /data/svnbackup that exists, whoever it is owned by, whatever junk is inside it.

# Construct a new empty SVNParent repository collection
rm -rf $DEST
mkdir $DEST
chown $APACHE_USER $DEST
chgrp $APACHE_GROUP $DEST
chmod 0700 $DEST
ls -ld $DEST
And do..... what? You've not actually confirmed that this has succeded
unless you do something if these bits fail.
Many of your comments seem to imply that this script has not been tested. Of course it's been tested already, and in any production environment it will be tested again. And if stdout and stderr are piped to /root/SVNBackup.out then I can check that output text reasonably often and see that it is still running. In this case I would check it daily for a week, weekly for a month or two, yearly forever, and every time somebody creates a new repository.

Also, by the standards of this part of the world, losing a day's work is not a catastrophe. Most people can remember what they did, and do it again, and it probably only takes a half-day to redo.

# Get all the names of all the repositories
# (Also gets names of any other entry in the SVNParent directory)
cd $SRCE
ls -d1 *>/tmp/SVNBackup.tmp
And *HERE* is where you start becoming a dead man id mkdir $DEST
failed. I believe that it works in your current environment, but if
the parent of $DEST does not exist, you're now officially in deep
danger executing these operations in whatever directory the script was
run from.
As noted above, $DEST is /data/svnbackup. The parent of $DEST is /data. /data is a partition on the server. If that partition is gone, that's a failure that we're talking about recovering from.
# Process each repository
for REPO in `cat /tmp/SVNBackup.tmp`
And again you're in trouble. If any of the repositories have
whitespace in their names, or funky EOL characters, the individual
words will be parsed as individual arguments.
This is Linux. Anyone who creates a repository with white space in the name gets shot.

do
    # some things are not repositories; ignore them
    if [ -d $SRCE/$REPO ]
Here is a likely bug in the script. I treat every subdirectory of the SVNParent repository collection as if it were a repository. But it might not be. There might be valid reasons for having a different type of subdirectory in there. Probably this line should read something like
    if [ -d $SRCE/$REPO/hooks ]
    then
        .. backup the repository..
    else
        ... just copy it over ...
    endif

    then
        # back up this repository
        echo "Backing up $REPO"
        # use hotcopy to get an exact copy
        # that can be reloaded onto the same system
        svnadmin  hotcopy  $SRCE/$REPO   $DEST/$REPO
        # use dump to get an inexact copy
        # that can be reloaded anywhere
        svnadmin  dump     $SRCE/$REPO>$DEST/$REPO.dump
    fi
done
See above. You're not reporting failures, in case the repository is
not of a compatible Subversion release as the current "svnadmin"
command. (This has happened to me when someoone copied a repository to
a server with older Subversion.)
Yes. But then the failure was on the setting up the repository, not on backing it up. Perhaps I should run
*    svnadmin verify $SRCE/$REPO*
first and take appropriate action if it fails. Oh, please don't tell me that 'svnadmin verify' doesn't really verify completely!

On another point, "reporting failures" ought to mean "sending e-mail to the sysadmin telling him that it failed. I've been trying to do that for years and cannot. I can not send e-mail to an arbitrary target e-mail address u...@example.com from a Linux shell script. * Most require 'sendmail', notoriously the hardest program on the planet to configure. * I found that installing 'sendmail', and not configuring it at all, prevented apache from starting at boot time. Probably something wrong with init. * Much of the documentation on sendmail only covers sending e-mail to an account on that server computer, not to u...@example.com elsewhere in the world. As if servers were timesharing systems. * Sendmail has to talk to an SMTP server. In the past couple of years it seems as if all the SMTP servers in the world have been linked into an authorization environment to prevent spam. So you can't just run your own SMTP - it's not certificated. * Thunderbird knows how to log in to an SMTP server; last time I looked sendmail did not.

Without e-mail, any notification system requires my contacting the machine, rather than the machine contacting me. And that is unreliable.

# Show the contents
echo "Contents of the backup:"
ls -ld $DEST/*
This is for /root/SVNBackup.out. It lists the repositories that have been included in the backup.

Indeed, the above line that reads
    echo "Backing up $REPO"
only exists because hotcopy outputs progress info. I tried "--quiet" and it didn't shut up. Maybe "-q" works.

# zip up the result
cd $DEST
zip -r -q -y $DEST.zip .
Don't use zip for this. zip is not installed by default on a lot of
UNIX and Linux systems, tar and gzip are, and give better compression.
Just about every uncompression suite in the world supports .tgz files
as gzipped tarfiles, so it's a lot more portable.
The 'zip' program is installable on every computer I've ever known. And, at least until recently, there were LOTS of operating systems that did not support .tar.gz or .tar.bz2 or the like. IMHO a zipped file is a lot more effectively portable. And the compression ratio is close enough that I'm willing to get 15% less compression for the portability.

Also, the script has ignored the problems of symlinks. You may not use
them, but a stack of people use symlinked files to pre-commit scripts,
password files, or other tools among various repositories from an
outside source. If you haven't at least searched for and reported
symlinks, you've got little chance of properly replicating them for
use elsewhere.
My guess is that there are two types of symlinks; those that point inside the repository and those that point outside the repository. Those that point inside the repository should be no problem. Those that point outside the repository are bad because there is no guarantee that the thing pointed to exists on any given machine that you use.

And AFAIK svnadmin and svndump preserve symlinks as such, and that is the best that I can do in either case.

Also, this is the kind of thing where you backup the symlink and later, if we must restore, some human being says "What does this symlink point to?"
# Talk to the user
echo "Backup is in file $DEST.zip:"
ls -ld $DEST.zip
It looks like you're relying on "ls -ld"
Again, this is a more-or-less standard part for the purpose of putting information into the /root/SVNBackup.out file. All of my backup scripts do this. Sometimes I look and say to myself "Why did the backup suddenly triple in size?" and dig around and discover that some subdirectory was added that should not have be present.

# The file $DEST.zip can now be transported to another computer.
And for a big repository, this is *grossly* inefficient. Transmitting
bulky compressing files means that you have to send the whole thing in
one bundle, or incorporate wrappers to split it into manageable
chunks. This gets awkward as your Subversion repositories grow, and
they *will* grow because Subversion really discourages discarding
*anything* from the repositories.
A backup file that is created on an attached portable disk does not need to be transported.

A backup file that is transmitted over a LAN once a day is not too big, no matter how big; 3 hours is a reasonable time frame for transport.

Historically I ran a crontab job every morning at 10AM that copied a backup file to a particular workstation on the LAN. By 10AM that workstation is turned on, and if it slows down, well, the lady who uses it is not technically minded enough to figure out WHY it's slowing down. And it was only a few megabytes.

Yeah, a zip of the entire SVNParent repository collection might be a too big to send over the Internet.

Oh yes, one more thing. Using svnadmin in various ways it is possible to purge old revisions from a repository. I would expect that we do that periodically, maybe once a year. If we're putting out version 5.0 of something, version 3.0 should not be in the repository, it should be in an archive.
  I'd very strongly urge you to revew
the use of "svnsync" to mirror the content of the repository to
another server on another system, coupled with a wrapper to get any
non-database components separately. This also reduces "churn" on your
drives, and can be so much faster that you can safely run it every 10
minutes for a separate read-only mirror site, a ViewVC or Fisheye
viewable repository, or publication of externally accessible
downloadable source.
I shy away from svnsync right now because it requires me to get TWO of these Subversion systems running. At present I am almost able to get one running. Almost.

As harsh as I'm being, Andy, it's actually not bad for a first shot by
someone who hasn't been steeped in the pain of writing industry grade
code like some of us. For a one-off in a very simple environment, it's
fine, something to get this weeks' backups done while you think about
a more thorough tool, it's reasonable, except for the big booby trap
David Chapman pointed out about using the hotcopies, not the active
repositories, for zipping up.
Thank you. I think the key phrase here is "a very simple environment". How much do we pay for a server? 400 dollars. One guy recommended buying a server for 4,000 dollars and he was darned near fired for it.

I fixed the booby trap already. Your comments will lead to some other changes. But not, for now, a second computer.

OH! I thought of something else!

Suppose we do a backup every night at midnight, copying it to a safe place. And suppose that the server dies at 8PM Tuesday evening. Then all submits that occurred on Tuesday have been lost. Presumably we'd find out about this on Wednesday.

But a working copy is a valid working copy until you delete it. Assuming that the working copies still exist, all we need to do is * Restore the working SVNParent repository collection on a replacement computer.
* Have everyone 'svn commit' from their working copies.
* Unscramble the merge problems, which should be few.

This becomes feasible if nobody deletes their working copy until 48 hours after their last commit. And my guess is that people will do that naturally. People who are working on the package will keep one working copy indefinitely, updating it but not checking out a whole new one. People who do only brief work on the package will not purge the working copy until they start worrying about disk space.

Thank you very much.

Reply via email to