(Sorry for the even longer quote, but this details some useful
experiments that might become regression tests for a solution)
On 30-12-2010 00:55, Ersek, Laszlo wrote:
(Sorry for the long quote, I'd like to keep the context.)
On Wed, 29 Dec 2010, Jakob Bohm wrote:
I am having a problem with the behavior of --listed-incremental when
the tape of a single-volume archive fills up.
To avoid the backups on subsequent days becoming increasingly larger,
I want the incremental index to list the files that were actually
backed up before the tape was full. However currently, I seem to get
an index saying that nothing was backed up on the full tape (its
uncompressed LTO-4 = 800Gio per tape).
One thing I have not tried (because it would create a multi-volume
archive with the second and later volumes missing) would be to
specify "--tape-length 800000000000 --info-script=/bin/false" however
given the observed handling of tape full I would suspect that doing
this would have the same result as a simple broken output pipe[2].
1. This is on a Debian 5.0.x(Lenny) system with its corresponding
version of GNU tar 1.20, but compiling another tar version to make
things work would not be much of a problem.
2. I am piping the output from tar through a double-buffering program
similar to the classic "buffer" program in order to reduce
shoe-shining in the tape drive caused by the disk being slower than
the tape. The buffer is configured to buffer up about 792 MibiOctets
of archive at a time, write lots of 100Kio tape blocks continuously
in a burst at tape speed, then sleep until the buffer is full again.
Thus tape full technically manifests itself as a broken output pipe
rather than a real disk full error code in case that makes any
difference to tar.
3. This is a production system, I really need the backup to work 5
days a week, 52 weeks/year. As each backup run may take up to 18
hours (2 hours lead time + 1 minute/burst), I cannot do much full
scale experimenting.
4. This is all done by a cron job, no human interaction possible.
I tried to check the situation that I think you would be in if the
disk could keep up with the tape:
- --listed-incremental
- tape full signalled with -1/ENOSPC on write()
- single volume
I created 10 files of '\0', 2 MB each (zf0 .. zf9). I created a 5.5 MB
ext2 filesystem image and loop-mounted it (/mnt/tmp). Then I tried to
create the single-volume archive, without any pre-existing metadata file.
$ rpm -q tar
tar-1.23-7.fc14.x86_64
$ ls -goh zf? img
-rw-------. 1 5.5M 2010-12-29 23:53:21 +0100 img
-rw-------. 1 2.0M 2010-12-29 23:43:13 +0100 zf0
-rw-------. 1 2.0M 2010-12-29 23:43:13 +0100 zf1
-rw-------. 1 2.0M 2010-12-29 23:43:13 +0100 zf2
-rw-------. 1 2.0M 2010-12-29 23:48:02 +0100 zf3
-rw-------. 1 2.0M 2010-12-29 23:48:02 +0100 zf4
-rw-------. 1 2.0M 2010-12-29 23:48:02 +0100 zf5
-rw-------. 1 2.0M 2010-12-29 23:52:00 +0100 zf6
-rw-------. 1 2.0M 2010-12-29 23:52:00 +0100 zf7
-rw-------. 1 2.0M 2010-12-29 23:52:00 +0100 zf8
-rw-------. 1 2.0M 2010-12-29 23:52:00 +0100 zf9
$ tar -c -v -f /mnt/tmp/z.tar --listed-incremental=z.snar zf*
zf0
zf1
zf2
tar: /mnt/tmp/z.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
$ ls -goh z.snar
-rw-------. 1 0 2010-12-30 00:04:15 +0100 z.snar
That is, without -M, the level 0 backup fails when it encounters
ENOSPC and the metadata file remains empty, even though two files were
written.
I removed the partial tar file and the empty snar file, and repeated
the above with -M. At each prompt I simply removed the last volume.
$ tar -M -c -v -f /mnt/tmp/z.tar --listed-incremental=z.snar zf*
zf0
zf1
zf2
Prepare volume #2 for `/mnt/tmp/z.tar' and hit return:
./GNUFileParts.10717/zf2.1
zf3
zf4
zf5
Prepare volume #3 for `/mnt/tmp/z.tar' and hit return:
./GNUFileParts.10717/zf5.2
zf6
zf7
Prepare volume #4 for `/mnt/tmp/z.tar' and hit return:
./GNUFileParts.10717/zf7.3
zf8
zf9
$ ls -goh z.snar
-rw-------. 1 36 2010-12-30 00:08:56 +0100 z.snar
I did this in order to end up with a complete metadata file. Now I
tried to update it, again by backing up to a single-volume file (level
1):
$ rm /mnt/tmp/z.tar
$ touch zf[5-8]
$ cp z.snar z.snar.bak
$ tar -c -v -f /mnt/tmp/z1.tar --listed-incremental=z.snar zf*
zf5
zf6
zf7
tar: /mnt/tmp/z1.tar: Wrote only 2048 of 10240 bytes
tar: Error is not recoverable: exiting now
$ cmp z.snar z.snar.bak
Thus the snar file was not updated, even though two files were
archived. I removed the partial single-volume level 1 archive and
retried the above (against the unchanged snar file) by backing up to a
multi-volume level 1 archive. Again, I "changed" volumes by "rm".
$ tar -M -c -v -f /mnt/tmp/z1.tar --listed-incremental=z.snar zf*
zf5
zf6
zf7
Prepare volume #2 for `/mnt/tmp/z1.tar' and hit return:
./GNUFileParts.10767/zf7.1
zf8
$ cmp z.snar.bak z.snar
z.snar.bak z.snar differ: char 23, line 2
This time the metadata file was updated correctly.
Until this point, I believe, this experiment has shown that without
-M, the snar file won't be updated if tar runs out of space, even if
you omit the buffering application and write directly to the tape. So
adding --tape-length=XXXX (which implies -M) can't put you in a worse
situation than you're presently in. The question is whether passing -M
and then refusing to provide blank media would create a usable snar
file (for level 0) or update it (for higher levels):
$ rm z.snar* /mnt/tmp/z1.tar
$ tar -M -c -v -f /mnt/tmp/z.tar --listed-incremental=z.snar zf*
zf0
zf1
zf2
Prepare volume #2 for `/mnt/tmp/z.tar' and hit return: q
tar: No new volume; exiting.
tar: WARNING: Archive is incomplete
tar: Error is not recoverable: exiting now
$ ls -goh z.snar
-rw-------. 1 0 2010-12-30 00:21:39 +0100 z.snar
The answer is negative. You really need to complete the entire backup
process to get a usable index, independently from whether you write to
a tape or a pipe, and from single/multi volume.
Ok, I suspected this.
For future tests with simulated buffering, just run "tar -b 20 -f -
[options and args] | dd bs=10240 count=[somesmallsize] of=/dev/null"
I don't know about the structure of the snar file, but if it contains
a single timestamp (for example the start or the end of the most
recent backup, or the highest mtime encountered during backup), then
it really may be updated only after all files were backed up.
Otherwise it could advance past the mtime of a file tar intended to
back up but missed because there was not enough space.
At least in 1.20, the .snar file is a list of individual file names and
some related timestamps, simply omitting some file names tells the next
tar run that those files are not backed up yet, I have tested that and
actually manipulate .snar files to provoke this effect.
So writing a .snar file indicating that only some files were
(completely) backed up should be trivial. Timestamps for incompletely
processed dirs should be written as (real mtime minus 2 seconds) in the
.snar file so the next run will reenumerate the dir and back up its
metadata again.
(I never found myself in the vicinity of a tape drive; caveat emptor.)
How unusual, not even in a museum?
Anyway, LTO drives are very much like the 1/2" cartridges used when GNU
was young (and like the big old Ampex/Tandberg 1/2" tape drives used as
movie props), except for some minor details (like Quantum buying up the
rights to the old DLT format, so the rest of the industry had to make a
new slightly different format called LT Open, and then the larger
capacity of cause).
They have variable length physical blocks on the tape and generally
support all of the mt(1) operations in actual hardware, using the Linux
kernels (or other free/non-free OS) default "st" driver. Speed is high
enough to max out its own dedicated SCSI adapter. Hardware programming
documentation boils down to one sentence "this drive is compliant with
the standard SCSI command set for tape drives".
Once recorded, a tape cartridge is a robust square lump of plastic that
can withstand more abuse than a portable hard drive. Some manufacturers
promise 30 years of data retention, meaning that a tape written in 1980
using the first BSD UNIX should still be readable if you can find a
compatible tape drive and hook it to a current GNU system. Tandberg is
still a player in this market. DEC sold their tape division to Quantum
years before becoming part of HP (which has its own tape division), so
to read an old DEC tape, get the proper drive from Quantum.