On Mon, Mar 30, 2026 at 2:33 AM Tomas Vondra <[email protected]> wrote:
> On 3/29/26 00:12, Tom Lane wrote:
> > I've reproduced Thomas' failure on a local FreeBSD 15.0 image
> > using zfs, and confirmed that this cowboy hack fixes it:
> >
>
> Interesting. Then I guess it has to be due to some difference in ufs vs.
> zfs, when handling sparse files. It might be useful to add a bit more
> variation here, and switch some of the animals to non-default
> filesystems (not just the FreeBSD ones, which we seem to have only two
> that run reasonably often). I'd bet most of the linux systems run on
> ext4/xfs, few on btrfs/zfs.

UFS does have sparse files (its ancestor invented them some time
around (time_t) 0), it just doesn't make them unless you tell it to.
PostgreSQL only does that if you set wal_init_zero=false.

ZFS is different because it creates holes automagically when you write
zeroes, at least if compression is enabled so it has to scan all your
bytes anyway.

I was curious to know if BTRFS does that too, or hides
zero-compression at some lower invisible level:

$ echo "hello" > 1MB-sparse.dat
$ truncate -s 512KB 1MB-sparse.dat
$ echo "world" >> 1MB-sparse.dat
$ truncate -s 1MB 1MB-sparse.dat
$ ls -l 1MB-sparse.dat
-rw-rw-r-- 1 tmunro tmunro 1000000 Mar 30 10:11 1MB-sparse.dat
$ du -hs 1MB-sparse.dat
8.0K    1MB-sparse.dat
$ strace tar -S -cf foo.tar 1MB-sparse.dat 2>&1 | grep seek
lseek(4, 0, SEEK_DATA)                  = 0
lseek(4, 0, SEEK_HOLE)                  = 4096
lseek(4, 4096, SEEK_DATA)               = 512000
lseek(4, 512000, SEEK_HOLE)             = 516096
lseek(4, 516096, SEEK_DATA)             = -1 ENXIO (No such device or address)

... so that's a yes, lseek sees holes that we didn't ask it to make,
just like on ZFS, but the rest of this trace of GNU tar -S -cf is
interesting:

lseek(5, 0, SEEK_SET)                   = 0
lseek(5, 0, SEEK_SET)                   = 0
lseek(4, 0, SEEK_SET)                   = 0
lseek(4, 512000, SEEK_SET)              = 512000
lseek(4, 1000000, SEEK_SET)             = 1000000

It didn't write out PAX format!  Instead it replicated the holes into
the tar file itself with SEEK_SET.

$ strings foo.tar | grep Sparse

You have to add --format=posix to enable the GNU behaviour that BSD
tar is emulating by default:

$ tar --format=posix -S -cf foo.tar 1MB-sparse.dat
$ strings foo.tar | grep Sparse
./GNUSparseFile.4190/1MB-sparse.dat

I expected GNU tar to be forced to do that if writing to non-seekable
output, eg "tar -S -c 1MB-sparse.dat | cat > foo.tar", but somehow it
manages to write out only ~10KB of plain ustar format that it is able
to restore to the full 1MB apparent size using some other trick, but
... ENOTIME, I dunno how it's doing that.  Might be interesting to see
if pg_waldump can read it though, 'cause the bytes aren't all there.

BTW I confirmed that Apple tar does have -S by default too, it's just
that APFS doesn't make holes magically, so this test would presumably
have broken on a Mac if wal_init_zero had been forced to zero (not
tested).

Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
unlikely to hit this in the wild, and the symptom is a confusing error
in a maintenance tool, not corruption, so I don't think this is a big
deal.  I might still try teaching the astreamer code to understand PAX
1.0 when it sees it in the next cycle though, for the benefit of
FreeBSD users.  A quick and dirty version could probably just unmangle
the name and skip the first block of data, since any valid WAL file
will not begin with a hole and valid WAL data will end at the first
hole and fail our verification, but of course a real implementation
should read the map properly[1]...

[1] https://www.gnu.org/software/tar/manual/html_node/PAX-1.html


Reply via email to