On Wed, Mar 18, 2026 at 8:46 PM Amul Sul <[email protected]> wrote: > > On Wed, Mar 18, 2026 at 5:15 PM Amul Sul <[email protected]> wrote: > > > > On Wed, Mar 11, 2026 at 10:38 PM Andrew Dunstan <[email protected]> wrote: > > > [...] > > > Looks pretty good. I have squashed them into three patches I think are > > > committable. Also attached is a diff showing what's changed - mainly this: > > > > > > . --follow + tar archive rejected (pg_waldump.c) — new validation > > > prevents a confusing pg_fatal when combining --follow with a tar archive > > > . error messages split (archive_waldump.c) — the single "could not read > > > file" error is now two distinct messages: "WAL segment is too short" > > > (truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an > > > issue raised in review > > > . hash table cleanup (archive_waldump.c) — free_archive_reader now > > > iterates and frees all remaining hash entries and destroys the table > > > > > > > The final squashed version looks good to me, thank you. But, I would > > like to propose splitting the 0001 patch into two separate commits: a > > preparatory refactoring of the pg_waldump code and a standalone commit > > that moves the tar archive detection and compression logic to a common > > location, as the latter is an independent improvement to the existing > > codebase. Additionally, since the test file refactoring was only kept > > separate to facilitate the review and has already been reviewed, I > > suggest merging those changes into the main feature patch i.e. 0002. > > All other elements should remain in a single preparatory refactoring > > patch for pg_waldump. > > > > Attached is the version that includes the proposed split. No > > additional changes to 0002 and 0003 patches. > > > > Added the two missing 'Reviewed-by' lines to the credit section of the > commit message and did a minor optimization in get_archive_wal_entry. >
Attaching an updated version. It includes some tweaks to code comments, adds an assert inside get_archive_wal_entry(), moves the archive_read_buf_size declaration and usage into an assert-enabled check, and makes a minor change to precheck_tar_backup_file() to assign out-variables only after successful validation. Regards, Amul
From ba736014228ea250b8eb155f2776bb86feed2b55 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 19 Mar 2026 15:43:30 +0530 Subject: [PATCH v20 1/5] Move tar detection and compression logic to common. Consolidate tar archive identification and compression-type detection logic into a shared location. Currently used by pg_basebackup and pg_verifybackup, this functionality is also required for upcoming pg_waldump enhancements. This change promotes code reuse and simplifies maintenance across frontend tools. Author: Amul Sul <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Jakub Wartak <[email protected]> Reviewed-by: Chao Li <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Andrew Dunstan <[email protected]> discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com --- src/bin/pg_basebackup/pg_basebackup.c | 36 +++++++---------------- src/bin/pg_verifybackup/pg_verifybackup.c | 12 +------- src/common/compression.c | 30 +++++++++++++++++++ src/include/common/compression.h | 2 ++ 4 files changed, 44 insertions(+), 36 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index fa169a8d642..c1a4672aa6f 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -1070,12 +1070,9 @@ CreateBackupStreamer(char *archive_name, char *spclocation, astreamer *manifest_inject_streamer = NULL; bool inject_manifest; bool is_tar, - is_tar_gz, - is_tar_lz4, - is_tar_zstd, is_compressed_tar; + pg_compress_algorithm compressed_tar_algorithm; bool must_parse_archive; - int archive_name_len = strlen(archive_name); /* * Normally, we emit the backup manifest as a separate file, but when @@ -1084,24 +1081,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation, */ inject_manifest = (format == 't' && strcmp(basedir, "-") == 0 && manifest); - /* Is this a tar archive? */ - is_tar = (archive_name_len > 4 && - strcmp(archive_name + archive_name_len - 4, ".tar") == 0); - - /* Is this a .tar.gz archive? */ - is_tar_gz = (archive_name_len > 7 && - strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0); - - /* Is this a .tar.lz4 archive? */ - is_tar_lz4 = (archive_name_len > 8 && - strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0); - - /* Is this a .tar.zst archive? */ - is_tar_zstd = (archive_name_len > 8 && - strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0); + /* Check whether it is a tar archive and its compression type */ + is_tar = parse_tar_compress_algorithm(archive_name, + &compressed_tar_algorithm); /* Is this any kind of compressed tar? */ - is_compressed_tar = is_tar_gz || is_tar_lz4 || is_tar_zstd; + is_compressed_tar = (is_tar && + compressed_tar_algorithm != PG_COMPRESSION_NONE); /* * Injecting the manifest into a compressed tar file would be possible if @@ -1128,7 +1114,7 @@ CreateBackupStreamer(char *archive_name, char *spclocation, (spclocation == NULL && writerecoveryconf)); /* At present, we only know how to parse tar archives. */ - if (must_parse_archive && !is_tar && !is_compressed_tar) + if (must_parse_archive && !is_tar) { pg_log_error("cannot parse archive \"%s\"", archive_name); pg_log_error_detail("Only tar archives can be parsed."); @@ -1263,13 +1249,13 @@ CreateBackupStreamer(char *archive_name, char *spclocation, * If the user has requested a server compressed archive along with * archive extraction at client then we need to decompress it. */ - if (format == 'p') + if (format == 'p' && is_compressed_tar) { - if (is_tar_gz) + if (compressed_tar_algorithm == PG_COMPRESSION_GZIP) streamer = astreamer_gzip_decompressor_new(streamer); - else if (is_tar_lz4) + else if (compressed_tar_algorithm == PG_COMPRESSION_LZ4) streamer = astreamer_lz4_decompressor_new(streamer); - else if (is_tar_zstd) + else if (compressed_tar_algorithm == PG_COMPRESSION_ZSTD) streamer = astreamer_zstd_decompressor_new(streamer); } diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index cbc9447384f..31f606c45b1 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -941,17 +941,7 @@ precheck_tar_backup_file(verifier_context *context, char *relpath, } /* Now, check the compression type of the tar */ - if (strcmp(suffix, ".tar") == 0) - compress_algorithm = PG_COMPRESSION_NONE; - else if (strcmp(suffix, ".tgz") == 0) - compress_algorithm = PG_COMPRESSION_GZIP; - else if (strcmp(suffix, ".tar.gz") == 0) - compress_algorithm = PG_COMPRESSION_GZIP; - else if (strcmp(suffix, ".tar.lz4") == 0) - compress_algorithm = PG_COMPRESSION_LZ4; - else if (strcmp(suffix, ".tar.zst") == 0) - compress_algorithm = PG_COMPRESSION_ZSTD; - else + if (!parse_tar_compress_algorithm(suffix, &compress_algorithm)) { report_backup_error(context, "file \"%s\" is not expected in a tar format backup", diff --git a/src/common/compression.c b/src/common/compression.c index 92cd4ec7a0d..fefbed68bea 100644 --- a/src/common/compression.c +++ b/src/common/compression.c @@ -41,6 +41,36 @@ static int expect_integer_value(char *keyword, char *value, static bool expect_boolean_value(char *keyword, char *value, pg_compress_specification *result); +/* + * Look up a compression algorithm by archive file extension. Returns true and + * sets *algorithm if the extension is recognized. Otherwise returns false. + */ +bool +parse_tar_compress_algorithm(const char *fname, pg_compress_algorithm *algorithm) +{ + int fname_len = strlen(fname); + + if (fname_len >= 4 && + strcmp(fname + fname_len - 4, ".tar") == 0) + *algorithm = PG_COMPRESSION_NONE; + else if (fname_len >= 4 && + strcmp(fname + fname_len - 4, ".tgz") == 0) + *algorithm = PG_COMPRESSION_GZIP; + else if (fname_len >= 7 && + strcmp(fname + fname_len - 7, ".tar.gz") == 0) + *algorithm = PG_COMPRESSION_GZIP; + else if (fname_len >= 8 && + strcmp(fname + fname_len - 8, ".tar.lz4") == 0) + *algorithm = PG_COMPRESSION_LZ4; + else if (fname_len >= 8 && + strcmp(fname + fname_len - 8, ".tar.zst") == 0) + *algorithm = PG_COMPRESSION_ZSTD; + else + return false; + + return true; +} + /* * Look up a compression algorithm by name. Returns true and sets *algorithm * if the name is recognized. Otherwise returns false. diff --git a/src/include/common/compression.h b/src/include/common/compression.h index 6c745b90066..f99c747cdd3 100644 --- a/src/include/common/compression.h +++ b/src/include/common/compression.h @@ -41,6 +41,8 @@ typedef struct pg_compress_specification extern void parse_compress_options(const char *option, char **algorithm, char **detail); +extern bool parse_tar_compress_algorithm(const char *fname, + pg_compress_algorithm *algorithm); extern bool parse_compress_algorithm(char *name, pg_compress_algorithm *algorithm); extern const char *get_compress_algorithm_name(pg_compress_algorithm algorithm); -- 2.47.1
From 7b36f9bdebaf9be7e5adb9b8dac25394cb611d0b Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 19 Mar 2026 15:43:39 +0530 Subject: [PATCH v20 2/5] pg_waldump: Preparatory refactoring for tar archive WAL decoding. Several refactoring steps in preparation for adding tar archive WAL decoding support to pg_waldump: - Move XLogDumpPrivate and related declarations into a new pg_waldump.h header, allowing a second source file to share them. - Factor out required_read_len() so the read-size calculation can be reused for both regular WAL files and tar-archived WAL. - Move the WAL segment size variable into XLogDumpPrivate and rename it to segsize, making it accessible to the archive streamer code. Author: Amul Sul <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Jakub Wartak <[email protected]> Reviewed-by: Chao Li <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Andrew Dunstan <[email protected]> discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com --- src/bin/pg_waldump/pg_waldump.c | 78 +++++++++++++++++++-------------- src/bin/pg_waldump/pg_waldump.h | 26 +++++++++++ 2 files changed, 70 insertions(+), 34 deletions(-) create mode 100644 src/bin/pg_waldump/pg_waldump.h diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index f3446385d6a..5d31b15dbd8 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -29,6 +29,7 @@ #include "common/logging.h" #include "common/relpath.h" #include "getopt_long.h" +#include "pg_waldump.h" #include "rmgrdesc.h" #include "storage/bufpage.h" @@ -43,14 +44,6 @@ static volatile sig_atomic_t time_to_stop = false; static const RelFileLocator emptyRelFileLocator = {0, 0, 0}; -typedef struct XLogDumpPrivate -{ - TimeLineID timeline; - XLogRecPtr startptr; - XLogRecPtr endptr; - bool endptr_reached; -} XLogDumpPrivate; - typedef struct XLogDumpConfig { /* display options */ @@ -333,6 +326,32 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz) return NULL; /* not reached */ } +/* + * Returns the size in bytes of the data to be read. Returns -1 if the end + * point has already been reached. + */ +static inline int +required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr, + int reqLen) +{ + int count = XLOG_BLCKSZ; + + if (XLogRecPtrIsValid(private->endptr)) + { + if (targetPagePtr + XLOG_BLCKSZ <= private->endptr) + count = XLOG_BLCKSZ; + else if (targetPagePtr + reqLen <= private->endptr) + count = private->endptr - targetPagePtr; + else + { + private->endptr_reached = true; + return -1; + } + } + + return count; +} + /* pg_waldump's XLogReaderRoutine->segment_open callback */ static void WALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo, @@ -390,21 +409,12 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, XLogRecPtr targetPtr, char *readBuff) { XLogDumpPrivate *private = state->private_data; - int count = XLOG_BLCKSZ; + int count = required_read_len(private, targetPagePtr, reqLen); WALReadError errinfo; - if (XLogRecPtrIsValid(private->endptr)) - { - if (targetPagePtr + XLOG_BLCKSZ <= private->endptr) - count = XLOG_BLCKSZ; - else if (targetPagePtr + reqLen <= private->endptr) - count = private->endptr - targetPagePtr; - else - { - private->endptr_reached = true; - return -1; - } - } + /* Bail out if the count to be read is not valid */ + if (count < 0) + return -1; if (!WALRead(state, readBuff, targetPagePtr, count, private->timeline, &errinfo)) @@ -801,7 +811,6 @@ main(int argc, char **argv) XLogRecPtr first_record; char *waldir = NULL; char *errormsg; - int WalSegSz; static struct option long_options[] = { {"bkp-details", no_argument, NULL, 'b'}, @@ -855,6 +864,7 @@ main(int argc, char **argv) memset(&stats, 0, sizeof(XLogStats)); private.timeline = 1; + private.segsize = 0; private.startptr = InvalidXLogRecPtr; private.endptr = InvalidXLogRecPtr; private.endptr_reached = false; @@ -1128,18 +1138,18 @@ main(int argc, char **argv) pg_fatal("could not open directory \"%s\": %m", waldir); } - waldir = identify_target_directory(waldir, fname, &WalSegSz); + waldir = identify_target_directory(waldir, fname, &private.segsize); fd = open_file_in_directory(waldir, fname); if (fd < 0) pg_fatal("could not open file \"%s\"", fname); close(fd); /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &segno, WalSegSz); + XLogFromFileName(fname, &private.timeline, &segno, private.segsize); if (!XLogRecPtrIsValid(private.startptr)) - XLogSegNoOffsetToRecPtr(segno, 0, WalSegSz, private.startptr); - else if (!XLByteInSeg(private.startptr, segno, WalSegSz)) + XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr); + else if (!XLByteInSeg(private.startptr, segno, private.segsize)) { pg_log_error("start WAL location %X/%08X is not inside file \"%s\"", LSN_FORMAT_ARGS(private.startptr), @@ -1149,7 +1159,7 @@ main(int argc, char **argv) /* no second file specified, set end position */ if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(segno + 1, 0, WalSegSz, private.endptr); + XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr); /* parse ENDSEG if passed */ if (optind + 1 < argc) @@ -1165,14 +1175,14 @@ main(int argc, char **argv) close(fd); /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &endsegno, WalSegSz); + XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize); if (endsegno < segno) pg_fatal("ENDSEG %s is before STARTSEG %s", argv[optind + 1], argv[optind]); if (!XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(endsegno + 1, 0, WalSegSz, + XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize, private.endptr); /* set segno to endsegno for check of --end */ @@ -1180,8 +1190,8 @@ main(int argc, char **argv) } - if (!XLByteInSeg(private.endptr, segno, WalSegSz) && - private.endptr != (segno + 1) * WalSegSz) + if (!XLByteInSeg(private.endptr, segno, private.segsize) && + private.endptr != (segno + 1) * private.segsize) { pg_log_error("end WAL location %X/%08X is not inside file \"%s\"", LSN_FORMAT_ARGS(private.endptr), @@ -1190,7 +1200,7 @@ main(int argc, char **argv) } } else - waldir = identify_target_directory(waldir, NULL, &WalSegSz); + waldir = identify_target_directory(waldir, NULL, &private.segsize); /* we don't know what to print */ if (!XLogRecPtrIsValid(private.startptr)) @@ -1203,7 +1213,7 @@ main(int argc, char **argv) /* we have everything we need, start reading */ xlogreader_state = - XLogReaderAllocate(WalSegSz, waldir, + XLogReaderAllocate(private.segsize, waldir, XL_ROUTINE(.page_read = WALDumpReadPage, .segment_open = WALDumpOpenSegment, .segment_close = WALDumpCloseSegment), @@ -1224,7 +1234,7 @@ main(int argc, char **argv) * a segment (e.g. we were used in file mode). */ if (first_record != private.startptr && - XLogSegmentOffset(private.startptr, WalSegSz) != 0) + XLogSegmentOffset(private.startptr, private.segsize) != 0) pg_log_info(ngettext("first record is after %X/%08X, at %X/%08X, skipping over %u byte", "first record is after %X/%08X, at %X/%08X, skipping over %u bytes", (first_record - private.startptr)), diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h new file mode 100644 index 00000000000..013b051506f --- /dev/null +++ b/src/bin/pg_waldump/pg_waldump.h @@ -0,0 +1,26 @@ +/*------------------------------------------------------------------------- + * + * pg_waldump.h - decode and display WAL + * + * Copyright (c) 2026, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/bin/pg_waldump/pg_waldump.h + *------------------------------------------------------------------------- + */ +#ifndef PG_WALDUMP_H +#define PG_WALDUMP_H + +#include "access/xlogdefs.h" + +/* Contains the necessary information to drive WAL decoding */ +typedef struct XLogDumpPrivate +{ + TimeLineID timeline; + int segsize; + XLogRecPtr startptr; + XLogRecPtr endptr; + bool endptr_reached; +} XLogDumpPrivate; + +#endif /* PG_WALDUMP_H */ -- 2.47.1
From 4c191b1cbed7eaa19d5ecff3072ce807943ffdf1 Mon Sep 17 00:00:00 2001 From: Amul Sul <[email protected]> Date: Thu, 19 Mar 2026 15:43:46 +0530 Subject: [PATCH v20 3/5] pg_waldump: Add support for reading WAL from tar archives pg_waldump can now accept the path to a tar archive (optionally compressed with gzip, lz4, or zstd) containing WAL files and decode them. This was added primarily for pg_verifybackup, which previously had to skip WAL parsing for tar-format backups. The implementation uses the existing archive streamer infrastructure with a hash table to track WAL segments read from the archive. If WAL files within the archive are not in sequential order, out-of-order segments are written to a temporary directory (created via mkdtemp under $TMPDIR or the archive's directory) and read back when needed. An atexit callback ensures the temporary directory is cleaned up. The --follow option is not supported when reading from a tar archive. Author: Amul Sul <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Jakub Wartak <[email protected]> Reviewed-by: Chao Li <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Andrew Dunstan <[email protected]> discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com --- doc/src/sgml/ref/pg_waldump.sgml | 23 +- src/bin/pg_waldump/Makefile | 7 +- src/bin/pg_waldump/archive_waldump.c | 847 +++++++++++++++++++++++++++ src/bin/pg_waldump/meson.build | 4 +- src/bin/pg_waldump/pg_waldump.c | 293 +++++++-- src/bin/pg_waldump/pg_waldump.h | 50 ++ src/bin/pg_waldump/t/001_basic.pl | 242 ++++++-- src/tools/pgindent/typedefs.list | 4 + 8 files changed, 1342 insertions(+), 128 deletions(-) create mode 100644 src/bin/pg_waldump/archive_waldump.c diff --git a/doc/src/sgml/ref/pg_waldump.sgml b/doc/src/sgml/ref/pg_waldump.sgml index d1715ff5124..9bbb4bd5772 100644 --- a/doc/src/sgml/ref/pg_waldump.sgml +++ b/doc/src/sgml/ref/pg_waldump.sgml @@ -141,13 +141,21 @@ PostgreSQL documentation <term><option>--path=<replaceable>path</replaceable></option></term> <listitem> <para> - Specifies a directory to search for WAL segment files or a - directory with a <literal>pg_wal</literal> subdirectory that + Specifies a tar archive or a directory to search for WAL segment files + or a directory with a <literal>pg_wal</literal> subdirectory that contains such files. The default is to search in the current directory, the <literal>pg_wal</literal> subdirectory of the current directory, and the <literal>pg_wal</literal> subdirectory of <envar>PGDATA</envar>. </para> + <para> + If a tar archive is provided and its WAL segment files are not in + sequential order, those files will be written to a temporary directory + named starting with <filename>waldump_tmp</filename>. This directory will be + created inside the directory specified by the <envar>TMPDIR</envar> + environment variable if it is set; otherwise, it will be created within + the same directory as the tar archive. + </para> </listitem> </varlistentry> @@ -383,6 +391,17 @@ PostgreSQL documentation </para> </listitem> </varlistentry> + + <varlistentry> + <term><envar>TMPDIR</envar></term> + <listitem> + <para> + Directory in which to create temporary files when reading WAL from a + tar archive with out-of-order segment files. If not set, the temporary + directory is created within the same directory as the tar archive. + </para> + </listitem> + </varlistentry> </variablelist> </refsect1> diff --git a/src/bin/pg_waldump/Makefile b/src/bin/pg_waldump/Makefile index 4c1ee649501..aabb87566a2 100644 --- a/src/bin/pg_waldump/Makefile +++ b/src/bin/pg_waldump/Makefile @@ -3,6 +3,9 @@ PGFILEDESC = "pg_waldump - decode and display WAL" PGAPPICON=win32 +# make these available to TAP test scripts +export TAR + subdir = src/bin/pg_waldump top_builddir = ../../.. include $(top_builddir)/src/Makefile.global @@ -10,13 +13,15 @@ include $(top_builddir)/src/Makefile.global OBJS = \ $(RMGRDESCOBJS) \ $(WIN32RES) \ + archive_waldump.o \ compat.o \ pg_waldump.o \ rmgrdesc.o \ xlogreader.o \ xlogstats.o -override CPPFLAGS := -DFRONTEND $(CPPFLAGS) +override CPPFLAGS := -DFRONTEND -I$(libpq_srcdir) $(CPPFLAGS) +LDFLAGS_INTERNAL += -L$(top_builddir)/src/fe_utils -lpgfeutils RMGRDESCSOURCES = $(sort $(notdir $(wildcard $(top_srcdir)/src/backend/access/rmgrdesc/*desc*.c))) RMGRDESCOBJS = $(patsubst %.c,%.o,$(RMGRDESCSOURCES)) diff --git a/src/bin/pg_waldump/archive_waldump.c b/src/bin/pg_waldump/archive_waldump.c new file mode 100644 index 00000000000..9cbcae3e8af --- /dev/null +++ b/src/bin/pg_waldump/archive_waldump.c @@ -0,0 +1,847 @@ +/*------------------------------------------------------------------------- + * + * archive_waldump.c + * A generic facility for reading WAL data from tar archives via archive + * streamer. + * + * Portions Copyright (c) 2026, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/bin/pg_waldump/archive_waldump.c + * + *------------------------------------------------------------------------- + */ + +#include "postgres_fe.h" + +#include <unistd.h> + +#include "access/xlog_internal.h" +#include "common/file_perm.h" +#include "common/hashfn.h" +#include "common/logging.h" +#include "fe_utils/simple_list.h" +#include "pg_waldump.h" + +/* + * How many bytes should we try to read from a file at once? + */ +#define READ_CHUNK_SIZE (128 * 1024) + +/* Temporary exported WAL file directory */ +char *TmpWalSegDir = NULL; + +/* + * Check if the start segment number is zero; this indicates a request to read + * any WAL file. + */ +#define READ_ANY_WAL(privateInfo) ((privateInfo)->start_segno == 0) + +/* + * Hash entry representing a WAL segment retrieved from the archive. + * + * While WAL segments are typically read sequentially, individual entries + * maintain their own buffers for the following reasons: + * + * 1. Boundary Handling: The archive streamer provides a continuous byte + * stream. A single streaming chunk may contain the end of one WAL segment + * and the start of the next. Separate buffers allow us to easily + * partition and track these bytes by their respective segments. + * + * 2. Out-of-Order Support: Dedicated buffers simplify logic when segments + * are archived or retrieved out of sequence. + * + * To minimize the memory footprint, entries and their associated buffers are + * freed immediately once consumed. Since pg_waldump does not request the same + * bytes twice, a segment is discarded as soon as pg_waldump moves past it. + */ +typedef struct ArchivedWALFile +{ + uint32 status; /* hash status */ + const char *fname; /* hash key: WAL segment name */ + + StringInfo buf; /* holds WAL bytes read from archive */ + bool spilled; /* true if the WAL data was spilled to a + * temporary file */ + + int read_len; /* total bytes of a WAL read from archive */ +} ArchivedWALFile; + +static uint32 hash_string_pointer(const char *s); +#define SH_PREFIX ArchivedWAL +#define SH_ELEMENT_TYPE ArchivedWALFile +#define SH_KEY_TYPE const char * +#define SH_KEY fname +#define SH_HASH_KEY(tb, key) hash_string_pointer(key) +#define SH_EQUAL(tb, a, b) (strcmp(a, b) == 0) +#define SH_SCOPE static inline +#define SH_RAW_ALLOCATOR pg_malloc0 +#define SH_DECLARE +#define SH_DEFINE +#include "lib/simplehash.h" + +typedef struct astreamer_waldump +{ + astreamer base; + XLogDumpPrivate *privateInfo; +} astreamer_waldump; + +static ArchivedWALFile *get_archive_wal_entry(const char *fname, + XLogDumpPrivate *privateInfo, + int WalSegSz); +static int read_archive_file(XLogDumpPrivate *privateInfo, Size count); +static void setup_tmpwal_dir(const char *waldir); +static void cleanup_tmpwal_dir_atexit(void); + +static FILE *prepare_tmp_write(const char *fname); +static void perform_tmp_write(const char *fname, StringInfo buf, FILE *file); + +static astreamer *astreamer_waldump_new(XLogDumpPrivate *privateInfo); +static void astreamer_waldump_content(astreamer *streamer, + astreamer_member *member, + const char *data, int len, + astreamer_archive_context context); +static void astreamer_waldump_finalize(astreamer *streamer); +static void astreamer_waldump_free(astreamer *streamer); + +static bool member_is_wal_file(astreamer_waldump *mystreamer, + astreamer_member *member, + char **fname); + +static const astreamer_ops astreamer_waldump_ops = { + .content = astreamer_waldump_content, + .finalize = astreamer_waldump_finalize, + .free = astreamer_waldump_free +}; + +/* + * Initializes the tar archive reader: opens the archive, builds a hash table + * for WAL entries, reads ahead until a full WAL page header is available to + * determine the WAL segment size, and computes start/end segment numbers for + * filtering. It also sets up a temporary directory for out-of-order WAL data + * and registers an atexit callback to clean it up. + */ +void +init_archive_reader(XLogDumpPrivate *privateInfo, const char *waldir, + int *WalSegSz, pg_compress_algorithm compression) +{ + int fd; + astreamer *streamer; + ArchivedWALFile *entry = NULL; + XLogLongPageHeader longhdr; + XLogSegNo segno; + TimeLineID timeline; + + /* Open tar archive and store its file descriptor */ + fd = open_file_in_directory(waldir, privateInfo->archive_name); + + if (fd < 0) + pg_fatal("could not open file \"%s\"", privateInfo->archive_name); + + privateInfo->archive_fd = fd; + + streamer = astreamer_waldump_new(privateInfo); + + /* We must first parse the tar archive. */ + streamer = astreamer_tar_parser_new(streamer); + + /* If the archive is compressed, decompress before parsing. */ + if (compression == PG_COMPRESSION_GZIP) + streamer = astreamer_gzip_decompressor_new(streamer); + else if (compression == PG_COMPRESSION_LZ4) + streamer = astreamer_lz4_decompressor_new(streamer); + else if (compression == PG_COMPRESSION_ZSTD) + streamer = astreamer_zstd_decompressor_new(streamer); + + privateInfo->archive_streamer = streamer; + + /* + * Allocate a buffer for reading the archive file to facilitate content + * decoding; read requests must not exceed the allocated buffer size. + */ + privateInfo->archive_read_buf = pg_malloc(READ_CHUNK_SIZE); + +#ifdef USE_ASSERT_CHECKING + privateInfo->archive_read_buf_size = READ_CHUNK_SIZE; +#endif + + /* + * Hash table storing WAL entries read from the archive with an arbitrary + * initial size. + */ + privateInfo->archive_wal_htab = ArchivedWAL_create(8, NULL); + + /* + * Read until we have at least one full WAL page (XLOG_BLCKSZ bytes) from + * the first WAL segment in the archive so we can extract the WAL segment + * size from the long page header. + */ + while (entry == NULL || entry->buf->len < XLOG_BLCKSZ) + { + if (read_archive_file(privateInfo, XLOG_BLCKSZ) == 0) + pg_fatal("could not find WAL in archive \"%s\"", + privateInfo->archive_name); + + entry = privateInfo->cur_file; + } + + /* Set WalSegSz if WAL data is successfully read */ + longhdr = (XLogLongPageHeader) entry->buf->data; + + if (!IsValidWalSegSize(longhdr->xlp_seg_size)) + { + pg_log_error(ngettext("invalid WAL segment size in WAL file from archive \"%s\" (%d byte)", + "invalid WAL segment size in WAL file from archive \"%s\" (%d bytes)", + longhdr->xlp_seg_size), + privateInfo->archive_name, longhdr->xlp_seg_size); + pg_log_error_detail("The WAL segment size must be a power of two between 1 MB and 1 GB."); + exit(1); + } + + *WalSegSz = longhdr->xlp_seg_size; + + /* + * With the WAL segment size available, we can now initialize the + * dependent start and end segment numbers. + */ + Assert(!XLogRecPtrIsInvalid(privateInfo->startptr)); + XLByteToSeg(privateInfo->startptr, privateInfo->start_segno, *WalSegSz); + + if (!XLogRecPtrIsInvalid(privateInfo->endptr)) + XLByteToSeg(privateInfo->endptr, privateInfo->end_segno, *WalSegSz); + + /* + * This WAL record was fetched before the filtering parameters + * (start_segno and end_segno) were fully initialized. Perform the + * relevance check against the user-provided range now; if the WAL falls + * outside this range, remove it from the hash table. Subsequent WAL will + * be filtered automatically by the archive streamer using the updated + * start_segno and end_segno values. + */ + XLogFromFileName(entry->fname, &timeline, &segno, privateInfo->segsize); + if (privateInfo->timeline != timeline || + privateInfo->start_segno > segno || + privateInfo->end_segno < segno) + free_archive_wal_entry(entry->fname, privateInfo); + + /* + * Setup temporary directory to store WAL segments and set up an exit + * callback to remove it upon completion. + */ + setup_tmpwal_dir(waldir); + atexit(cleanup_tmpwal_dir_atexit); +} + +/* + * Release the archive streamer chain and close the archive file. + */ +void +free_archive_reader(XLogDumpPrivate *privateInfo) +{ + /* + * NB: Normally, astreamer_finalize() is called before astreamer_free() to + * flush any remaining buffered data or to ensure the end of the tar + * archive is reached. However, when decoding WAL, once we hit the end + * LSN, any remaining buffered data or unread portion of the archive can + * be safely ignored. + */ + astreamer_free(privateInfo->archive_streamer); + + /* Free any remaining hash table entries and their buffers. */ + if (privateInfo->archive_wal_htab != NULL) + { + ArchivedWAL_iterator iter; + ArchivedWALFile *entry; + + ArchivedWAL_start_iterate(privateInfo->archive_wal_htab, &iter); + while ((entry = ArchivedWAL_iterate(privateInfo->archive_wal_htab, + &iter)) != NULL) + { + if (entry->buf != NULL) + destroyStringInfo(entry->buf); + } + ArchivedWAL_destroy(privateInfo->archive_wal_htab); + privateInfo->archive_wal_htab = NULL; + } + + /* Free the reusable read buffer. */ + if (privateInfo->archive_read_buf != NULL) + { + pg_free(privateInfo->archive_read_buf); + privateInfo->archive_read_buf = NULL; + } + + /* Close the file. */ + if (close(privateInfo->archive_fd) != 0) + pg_log_error("could not close file \"%s\": %m", + privateInfo->archive_name); +} + +/* + * Copies the requested WAL data from the hash entry's buffer into readBuff. + * If the buffer does not yet contain the needed bytes, fetches more data from + * the tar archive via the archive streamer. + */ +int +read_archive_wal_page(XLogDumpPrivate *privateInfo, XLogRecPtr targetPagePtr, + Size count, char *readBuff, int WalSegSz) +{ + char *p = readBuff; + Size nbytes = count; + XLogRecPtr recptr = targetPagePtr; + XLogSegNo segno; + char fname[MAXFNAMELEN]; + ArchivedWALFile *entry; + + /* Identify the segment and locate its entry in the archive hash */ + XLByteToSeg(targetPagePtr, segno, WalSegSz); + XLogFileName(fname, privateInfo->timeline, segno, WalSegSz); + entry = get_archive_wal_entry(fname, privateInfo, WalSegSz); + + while (nbytes > 0) + { + char *buf = entry->buf->data; + int bufLen = entry->buf->len; + XLogRecPtr endPtr; + XLogRecPtr startPtr; + + /* + * Calculate the LSN range currently residing in the buffer. + * + * read_len tracks total bytes received for this segment (including + * already-discarded data), so endPtr is the LSN just past the last + * buffered byte, and startPtr is the LSN of the first buffered byte. + */ + XLogSegNoOffsetToRecPtr(segno, entry->read_len, WalSegSz, endPtr); + startPtr = endPtr - bufLen; + + /* + * Copy the requested WAL record if it exists in the buffer. + */ + if (bufLen > 0 && startPtr <= recptr && recptr < endPtr) + { + int copyBytes; + int offset = recptr - startPtr; + + /* + * Given startPtr <= recptr < endPtr and a total buffer size + * 'bufLen', the offset (recptr - startPtr) will always be less + * than 'bufLen'. + */ + Assert(offset < bufLen); + + copyBytes = Min(nbytes, bufLen - offset); + memcpy(p, buf + offset, copyBytes); + + /* Update state for read */ + recptr += copyBytes; + nbytes -= copyBytes; + p += copyBytes; + } + else + { + /* + * Before starting the actual decoding loop, pg_waldump tries to + * locate the first valid record from the user-specified start + * position, which might not be the start of a WAL record and + * could fall in the middle of a record that spans multiple pages. + * Consequently, the valid start position the decoder is looking + * for could be far away from that initial position. + * + * This may involve reading across multiple pages, and this + * pre-reading fetches data in multiple rounds from the archive + * streamer; normally, we would throw away existing buffer + * contents to fetch the next set of data, but that existing data + * might be needed once the main loop starts. Because previously + * read data cannot be re-read by the archive streamer, we delay + * resetting the buffer until the main decoding loop is entered. + * + * Once pg_waldump has entered the main loop, it may re-read the + * currently active page, but never an older one; therefore, any + * fully consumed WAL data preceding the current page can then be + * safely discarded. + */ + if (privateInfo->decoding_started) + { + resetStringInfo(entry->buf); + + /* + * Push back the partial page data for the current page to the + * buffer, ensuring a full page remains available for + * re-reading if requested. + */ + if (p > readBuff) + { + Assert((count - nbytes) > 0); + appendBinaryStringInfo(entry->buf, readBuff, count - nbytes); + } + } + + /* + * Now, fetch more data. Raise an error if the archive streamer + * has moved past our segment (meaning the WAL file in the archive + * is shorter than expected) or if reading the archive reached + * EOF. + */ + if (privateInfo->cur_file != entry) + pg_fatal("WAL segment \"%s\" in archive \"%s\" is too short: read %lld of %lld bytes", + fname, privateInfo->archive_name, + (long long int) (count - nbytes), + (long long int) count); + if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) + pg_fatal("unexpected end of archive \"%s\" while reading \"%s\": read %lld of %lld bytes", + privateInfo->archive_name, fname, + (long long int) (count - nbytes), + (long long int) count); + } + } + + /* + * Should have successfully read all the requested bytes or reported a + * failure before this point. + */ + Assert(nbytes == 0); + + /* + * NB: We return count unchanged. We could return a boolean since we + * either successfully read the WAL page or raise an error, but the caller + * expects this value to be returned. The routine that reads WAL pages + * from physical WAL files follows the same convention. + */ + return count; +} + +/* + * Releases the buffer of a WAL entry that is no longer needed, preventing the + * accumulation of irrelevant WAL data. Also removes any associated temporary + * file and clears privateInfo->cur_file if it points to this entry, so the + * archive streamer skips subsequent data for it. + */ +void +free_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo) +{ + ArchivedWALFile *entry; + + entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname); + + if (entry == NULL) + return; + + /* Destroy the buffer */ + destroyStringInfo(entry->buf); + entry->buf = NULL; + + /* Remove temporary file if any */ + if (entry->spilled) + { + char fpath[MAXPGPATH]; + + snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname); + + if (unlink(fpath) == 0) + pg_log_debug("removed file \"%s\"", fpath); + } + + /* Clear cur_file if it points to the entry being freed */ + if (privateInfo->cur_file == entry) + privateInfo->cur_file = NULL; + + ArchivedWAL_delete_item(privateInfo->archive_wal_htab, entry); +} + +/* + * Returns the archived WAL entry from the hash table if it already exists. + * Otherwise, reads more data from the archive until the requested entry is + * found. If the archive streamer is reading a WAL file from the archive that + * is not currently needed, that data is spilled to a temporary file for later + * retrieval. + */ +static ArchivedWALFile * +get_archive_wal_entry(const char *fname, XLogDumpPrivate *privateInfo, + int WalSegSz) +{ + ArchivedWALFile *entry = NULL; + FILE *write_fp = NULL; + + /* + * Search the hash table first. If the entry is found, return it. + * Otherwise, the requested WAL entry hasn't been read from the archive + * yet; invoke the archive streamer to fetch it. + */ + while (1) + { + /* + * Search hash table. + * + * We perform the search inside the loop because a single iteration of + * the archive reader may decompress and extract multiple files into + * the hash table. One of these newly added files could be the one we + * are seeking. + */ + entry = ArchivedWAL_lookup(privateInfo->archive_wal_htab, fname); + + if (entry != NULL) + return entry; + + /* + * The WAL file entry currently being processed may change during + * archive streamer execution. Therefore, maintain a local variable to + * reference the previous entry, ensuring that any remaining data in + * its buffer is successfully flushed to the temporary file before + * switching to the next WAL entry. + */ + entry = privateInfo->cur_file; + + /* + * Fetch more data either when no current file is being tracked or + * when its buffer has been fully flushed to the temporary file. + */ + if (entry == NULL || entry->buf->len == 0) + { + if (read_archive_file(privateInfo, READ_CHUNK_SIZE) == 0) + break; /* archive file ended */ + } + + /* + * Archive streamer is reading a non-WAL file or an irrelevant WAL + * file. + */ + if (entry == NULL) + continue; + + /* + * Archive streamer is currently reading a file that isn't the one + * asked for, but it's required in the future. It should be written to + * a temporary location for retrieval when needed. + */ + Assert(strcmp(fname, entry->fname) != 0); + + /* Create a temporary file if one does not already exist */ + if (!entry->spilled) + { + write_fp = prepare_tmp_write(entry->fname); + entry->spilled = true; + } + + /* Flush data from the buffer to the file */ + perform_tmp_write(entry->fname, entry->buf, write_fp); + resetStringInfo(entry->buf); + + /* + * The change in the current segment entry indicates that the reading + * of this file has ended. + */ + if (entry != privateInfo->cur_file && write_fp != NULL) + { + fclose(write_fp); + write_fp = NULL; + } + } + + /* Requested WAL segment not found */ + pg_fatal("could not find WAL \"%s\" in archive \"%s\"", + fname, privateInfo->archive_name); +} + +/* + * Reads the archive file and passes it to the archive streamer for + * decompression. + */ +static int +read_archive_file(XLogDumpPrivate *privateInfo, Size count) +{ + int rc; + + /* The read request must not exceed the allocated buffer size. */ + Assert(privateInfo->archive_read_buf_size >= count); + + rc = read(privateInfo->archive_fd, privateInfo->archive_read_buf, count); + if (rc < 0) + pg_fatal("could not read file \"%s\": %m", + privateInfo->archive_name); + + /* + * Decompress (if required), and then parse the previously read contents + * of the tar file. + */ + if (rc > 0) + astreamer_content(privateInfo->archive_streamer, NULL, + privateInfo->archive_read_buf, rc, + ASTREAMER_UNKNOWN); + + return rc; +} + +/* + * Set up a temporary directory to temporarily store WAL segments. + */ +static void +setup_tmpwal_dir(const char *waldir) +{ + char *template; + + /* + * Use the directory specified by the TMPDIR environment variable. If it's + * not set, fall back to the provided WAL directory to store WAL files + * temporarily. + */ + template = psprintf("%s/waldump_tmp-XXXXXX", + getenv("TMPDIR") ? getenv("TMPDIR") : waldir); + TmpWalSegDir = mkdtemp(template); + + if (TmpWalSegDir == NULL) + pg_fatal("could not create directory \"%s\": %m", template); + + canonicalize_path(TmpWalSegDir); + + pg_log_debug("created directory \"%s\"", TmpWalSegDir); +} + +/* + * Remove temporary directory at exit, if any. + */ +static void +cleanup_tmpwal_dir_atexit(void) +{ + rmtree(TmpWalSegDir, true); +} + +/* + * Create an empty placeholder file and return its handle. + */ +static FILE * +prepare_tmp_write(const char *fname) +{ + char fpath[MAXPGPATH]; + FILE *file; + + snprintf(fpath, MAXPGPATH, "%s/%s", TmpWalSegDir, fname); + + /* Create an empty placeholder */ + file = fopen(fpath, PG_BINARY_W); + if (file == NULL) + pg_fatal("could not create file \"%s\": %m", fpath); + +#ifndef WIN32 + if (chmod(fpath, pg_file_create_mode)) + pg_fatal("could not set permissions on file \"%s\": %m", + fpath); +#endif + + pg_log_debug("spilling to temporary file \"%s\"", fpath); + + return file; +} + +/* + * Write buffer data to the given file handle. + */ +static void +perform_tmp_write(const char *fname, StringInfo buf, FILE *file) +{ + Assert(file); + + errno = 0; + if (buf->len > 0 && fwrite(buf->data, buf->len, 1, file) != 1) + { + /* + * If write didn't set errno, assume problem is no disk space + */ + if (errno == 0) + errno = ENOSPC; + pg_fatal("could not write to file \"%s/%s\": %m", TmpWalSegDir, fname); + } +} + +/* + * Create an astreamer that can read WAL from tar file. + */ +static astreamer * +astreamer_waldump_new(XLogDumpPrivate *privateInfo) +{ + astreamer_waldump *streamer; + + streamer = palloc0_object(astreamer_waldump); + *((const astreamer_ops **) &streamer->base.bbs_ops) = + &astreamer_waldump_ops; + + streamer->privateInfo = privateInfo; + + return &streamer->base; +} + +/* + * Main entry point of the archive streamer for reading WAL data from a tar + * file. If a member is identified as a valid WAL file, a hash entry is created + * for it, and its contents are copied into that entry's buffer, making them + * accessible to the decoding routine. + */ +static void +astreamer_waldump_content(astreamer *streamer, astreamer_member *member, + const char *data, int len, + astreamer_archive_context context) +{ + astreamer_waldump *mystreamer = (astreamer_waldump *) streamer; + XLogDumpPrivate *privateInfo = mystreamer->privateInfo; + + Assert(context != ASTREAMER_UNKNOWN); + + switch (context) + { + case ASTREAMER_MEMBER_HEADER: + { + char *fname = NULL; + ArchivedWALFile *entry; + bool found; + + pg_log_debug("reading \"%s\"", member->pathname); + + if (!member_is_wal_file(mystreamer, member, &fname)) + break; + + /* + * Further checks are skipped if any WAL file can be read. + * This typically occurs during initial verification. + */ + if (!READ_ANY_WAL(privateInfo)) + { + XLogSegNo segno; + TimeLineID timeline; + + /* + * Skip the segment if the timeline does not match, if it + * falls outside the caller-specified range. + */ + XLogFromFileName(fname, &timeline, &segno, privateInfo->segsize); + if (privateInfo->timeline != timeline || + privateInfo->start_segno > segno || + privateInfo->end_segno < segno) + { + pfree(fname); + break; + } + } + + entry = ArchivedWAL_insert(privateInfo->archive_wal_htab, + fname, &found); + + /* + * Shouldn't happen, but if it does, simply ignore the + * duplicate WAL file. + */ + if (found) + { + pg_log_warning("ignoring duplicate WAL \"%s\" found in archive \"%s\"", + member->pathname, privateInfo->archive_name); + pfree(fname); + break; + } + + entry->buf = makeStringInfo(); + entry->spilled = false; + entry->read_len = 0; + privateInfo->cur_file = entry; + } + break; + + case ASTREAMER_MEMBER_CONTENTS: + if (privateInfo->cur_file) + { + appendBinaryStringInfo(privateInfo->cur_file->buf, data, len); + privateInfo->cur_file->read_len += len; + } + break; + + case ASTREAMER_MEMBER_TRAILER: + + /* + * End of this tar member; mark cur_file NULL so subsequent + * content callbacks (if any) know no WAL file is currently + * active. + */ + privateInfo->cur_file = NULL; + break; + + case ASTREAMER_ARCHIVE_TRAILER: + break; + + default: + /* Shouldn't happen. */ + pg_fatal("unexpected state while parsing tar file"); + } +} + +/* + * End-of-stream processing for an astreamer_waldump stream. This is a + * terminal streamer so it must have no successor. + */ +static void +astreamer_waldump_finalize(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); +} + +/* + * Free memory associated with an astreamer_waldump stream. + */ +static void +astreamer_waldump_free(astreamer *streamer) +{ + Assert(streamer->bbs_next == NULL); + pfree(streamer); +} + +/* + * Returns true if the archive member name matches the WAL naming format. If + * successful, it also outputs the WAL segment name. + */ +static bool +member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member, + char **fname) +{ + int pathlen; + char pathname[MAXPGPATH]; + char *filename; + + /* We are only interested in normal files */ + if (member->is_directory || member->is_link) + return false; + + if (strlen(member->pathname) < XLOG_FNAME_LEN) + return false; + + /* + * For a correct comparison, we must remove any '.' or '..' components + * from the member pathname. Similar to member_verify_header(), we prepend + * './' to the path so that canonicalize_path() can properly resolve and + * strip these references from the tar member name. + */ + snprintf(pathname, MAXPGPATH, "./%s", member->pathname); + canonicalize_path(pathname); + pathlen = strlen(pathname); + + /* WAL files from the top-level or pg_wal directory will be decoded */ + if (pathlen > XLOG_FNAME_LEN && + strncmp(pathname, XLOGDIR, strlen(XLOGDIR)) != 0) + return false; + + /* WAL file may appear with a full path (e.g., pg_wal/<name>) */ + filename = pathname + (pathlen - XLOG_FNAME_LEN); + if (!IsXLogFileName(filename)) + return false; + + *fname = pnstrdup(filename, XLOG_FNAME_LEN); + + return true; +} + +/* + * Helper function for WAL file hash table. + */ +static uint32 +hash_string_pointer(const char *s) +{ + unsigned char *ss = (unsigned char *) s; + + return hash_bytes(ss, strlen(s)); +} diff --git a/src/bin/pg_waldump/meson.build b/src/bin/pg_waldump/meson.build index 633a9874bb5..5296f21b82c 100644 --- a/src/bin/pg_waldump/meson.build +++ b/src/bin/pg_waldump/meson.build @@ -1,6 +1,7 @@ # Copyright (c) 2022-2026, PostgreSQL Global Development Group pg_waldump_sources = files( + 'archive_waldump.c', 'compat.c', 'pg_waldump.c', 'rmgrdesc.c', @@ -18,7 +19,7 @@ endif pg_waldump = executable('pg_waldump', pg_waldump_sources, - dependencies: [frontend_code, lz4, zstd], + dependencies: [frontend_code, libpq, lz4, zstd], c_args: ['-DFRONTEND'], # needed for xlogreader et al kwargs: default_bin_args, ) @@ -29,6 +30,7 @@ tests += { 'sd': meson.current_source_dir(), 'bd': meson.current_build_dir(), 'tap': { + 'env': {'TAR': tar.found() ? tar.full_path() : ''}, 'tests': [ 't/001_basic.pl', 't/002_save_fullpage.pl', diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c index 5d31b15dbd8..f28153165e6 100644 --- a/src/bin/pg_waldump/pg_waldump.c +++ b/src/bin/pg_waldump/pg_waldump.c @@ -176,7 +176,7 @@ split_path(const char *path, char **dir, char **fname) * * return a read only fd */ -static int +int open_file_in_directory(const char *directory, const char *fname) { int fd = -1; @@ -327,8 +327,8 @@ identify_target_directory(char *directory, char *fname, int *WalSegSz) } /* - * Returns the size in bytes of the data to be read. Returns -1 if the end - * point has already been reached. + * Returns the number of bytes to read for the given page. Returns -1 if + * the requested range has already been reached or exceeded. */ static inline int required_read_len(XLogDumpPrivate *private, XLogRecPtr targetPagePtr, @@ -440,6 +440,106 @@ WALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, return count; } +/* + * pg_waldump's XLogReaderRoutine->segment_open callback to support dumping WAL + * files from tar archives. Segment tracking is handled by + * TarWALDumpReadPage, so no action is needed here. + */ +static void +TarWALDumpOpenSegment(XLogReaderState *state, XLogSegNo nextSegNo, + TimeLineID *tli_p) +{ + /* No action needed */ +} + +/* + * pg_waldump's XLogReaderRoutine->segment_close callback to support dumping + * WAL files from tar archives. Segment tracking is handled by + * TarWALDumpReadPage, so no action is needed here. + */ +static void +TarWALDumpCloseSegment(XLogReaderState *state) +{ + /* No action needed */ +} + +/* + * pg_waldump's XLogReaderRoutine->page_read callback to support dumping WAL + * files from tar archives. + */ +static int +TarWALDumpReadPage(XLogReaderState *state, XLogRecPtr targetPagePtr, int reqLen, + XLogRecPtr targetPtr, char *readBuff) +{ + XLogDumpPrivate *private = state->private_data; + int count = required_read_len(private, targetPagePtr, reqLen); + int WalSegSz = state->segcxt.ws_segsize; + XLogSegNo curSegNo; + + /* Bail out if the count to be read is not valid */ + if (count < 0) + return -1; + + /* + * If the target page is in a different segment, free the buffer and/or + * temporary file disk space occupied by the previous segment's data. + * Since pg_waldump never requests the same WAL bytes twice, moving to a + * new segment implies the previous buffer's data and that segment will + * not be needed again. + * + * Afterward, check for the next required WAL segment's physical existence + * in the temporary directory first before invoking the archive streamer. + */ + curSegNo = state->seg.ws_segno; + if (!XLByteInSeg(targetPagePtr, curSegNo, WalSegSz)) + { + char fname[MAXFNAMELEN]; + XLogSegNo nextSegNo; + + /* + * Calculate the next WAL segment to be decoded from the given page + * pointer. + */ + XLByteToSeg(targetPagePtr, nextSegNo, WalSegSz); + state->seg.ws_tli = private->timeline; + state->seg.ws_segno = nextSegNo; + + /* Close the WAL segment file if it is currently open */ + if (state->seg.ws_file >= 0) + { + close(state->seg.ws_file); + state->seg.ws_file = -1; + } + + /* + * If in pre-reading mode (prior to actual decoding), do not delete + * any entries that might be requested again once the decoding loop + * starts. For more details, see the comments in + * read_archive_wal_page(). + */ + if (private->decoding_started && curSegNo < nextSegNo) + { + XLogFileName(fname, state->seg.ws_tli, curSegNo, WalSegSz); + free_archive_wal_entry(fname, private); + } + + /* + * If the next segment exists, open it and continue reading from there + */ + XLogFileName(fname, state->seg.ws_tli, nextSegNo, WalSegSz); + state->seg.ws_file = open_file_in_directory(TmpWalSegDir, fname); + } + + /* Continue reading from the open WAL segment, if any */ + if (state->seg.ws_file >= 0) + return WALDumpReadPage(state, targetPagePtr, count, targetPtr, + readBuff); + + /* Otherwise, read the WAL page from the archive streamer */ + return read_archive_wal_page(private, targetPagePtr, count, readBuff, + WalSegSz); +} + /* * Boolean to return whether the given WAL record matches a specific relation * and optionally block. @@ -777,8 +877,8 @@ usage(void) printf(_(" -F, --fork=FORK only show records that modify blocks in fork FORK;\n" " valid names are main, fsm, vm, init\n")); printf(_(" -n, --limit=N number of records to display\n")); - printf(_(" -p, --path=PATH directory in which to find WAL segment files or a\n" - " directory with a ./pg_wal that contains such files\n" + printf(_(" -p, --path=PATH a tar archive or a directory in which to find WAL segment files or\n" + " a directory with a pg_wal subdirectory containing such files\n" " (default: current directory, ./pg_wal, $PGDATA/pg_wal)\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); printf(_(" -r, --rmgr=RMGR only show records generated by resource manager RMGR;\n" @@ -810,7 +910,9 @@ main(int argc, char **argv) XLogRecord *record; XLogRecPtr first_record; char *waldir = NULL; + char *walpath = NULL; char *errormsg; + pg_compress_algorithm compression = PG_COMPRESSION_NONE; static struct option long_options[] = { {"bkp-details", no_argument, NULL, 'b'}, @@ -868,6 +970,10 @@ main(int argc, char **argv) private.startptr = InvalidXLogRecPtr; private.endptr = InvalidXLogRecPtr; private.endptr_reached = false; + private.decoding_started = false; + private.archive_name = NULL; + private.start_segno = 0; + private.end_segno = UINT64_MAX; config.quiet = false; config.bkp_details = false; @@ -943,7 +1049,7 @@ main(int argc, char **argv) } break; case 'p': - waldir = pg_strdup(optarg); + walpath = pg_strdup(optarg); break; case 'q': config.quiet = true; @@ -1107,12 +1213,21 @@ main(int argc, char **argv) goto bad_argument; } - if (waldir != NULL) + if (walpath != NULL) { + /* validate path points to tar archive */ + if (parse_tar_compress_algorithm(walpath, &compression)) + { + char *fname = NULL; + + split_path(walpath, &waldir, &fname); + + private.archive_name = fname; + } /* validate path points to directory */ - if (!verify_directory(waldir)) + else if (!verify_directory(walpath)) { - pg_log_error("could not open directory \"%s\": %m", waldir); + pg_log_error("could not open directory \"%s\": %m", walpath); goto bad_argument; } } @@ -1128,6 +1243,17 @@ main(int argc, char **argv) int fd; XLogSegNo segno; + /* + * If a tar archive is passed using the --path option, all other + * arguments become unnecessary. + */ + if (private.archive_name) + { + pg_log_error("unnecessary command-line arguments specified with tar archive (first is \"%s\")", + argv[optind]); + goto bad_argument; + } + split_path(argv[optind], &directory, &fname); if (waldir == NULL && directory != NULL) @@ -1138,69 +1264,75 @@ main(int argc, char **argv) pg_fatal("could not open directory \"%s\": %m", waldir); } - waldir = identify_target_directory(waldir, fname, &private.segsize); - fd = open_file_in_directory(waldir, fname); - if (fd < 0) - pg_fatal("could not open file \"%s\"", fname); - close(fd); - - /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &segno, private.segsize); - - if (!XLogRecPtrIsValid(private.startptr)) - XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr); - else if (!XLByteInSeg(private.startptr, segno, private.segsize)) + if (fname != NULL && parse_tar_compress_algorithm(fname, &compression)) { - pg_log_error("start WAL location %X/%08X is not inside file \"%s\"", - LSN_FORMAT_ARGS(private.startptr), - fname); - goto bad_argument; + private.archive_name = fname; } - - /* no second file specified, set end position */ - if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr); - - /* parse ENDSEG if passed */ - if (optind + 1 < argc) + else { - XLogSegNo endsegno; - - /* ignore directory, already have that */ - split_path(argv[optind + 1], &directory, &fname); - + waldir = identify_target_directory(waldir, fname, &private.segsize); fd = open_file_in_directory(waldir, fname); if (fd < 0) pg_fatal("could not open file \"%s\"", fname); close(fd); /* parse position from file */ - XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize); + XLogFromFileName(fname, &private.timeline, &segno, private.segsize); - if (endsegno < segno) - pg_fatal("ENDSEG %s is before STARTSEG %s", - argv[optind + 1], argv[optind]); + if (!XLogRecPtrIsValid(private.startptr)) + XLogSegNoOffsetToRecPtr(segno, 0, private.segsize, private.startptr); + else if (!XLByteInSeg(private.startptr, segno, private.segsize)) + { + pg_log_error("start WAL location %X/%08X is not inside file \"%s\"", + LSN_FORMAT_ARGS(private.startptr), + fname); + goto bad_argument; + } - if (!XLogRecPtrIsValid(private.endptr)) - XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize, - private.endptr); + /* no second file specified, set end position */ + if (!(optind + 1 < argc) && !XLogRecPtrIsValid(private.endptr)) + XLogSegNoOffsetToRecPtr(segno + 1, 0, private.segsize, private.endptr); - /* set segno to endsegno for check of --end */ - segno = endsegno; - } + /* parse ENDSEG if passed */ + if (optind + 1 < argc) + { + XLogSegNo endsegno; + /* ignore directory, already have that */ + split_path(argv[optind + 1], &directory, &fname); - if (!XLByteInSeg(private.endptr, segno, private.segsize) && - private.endptr != (segno + 1) * private.segsize) - { - pg_log_error("end WAL location %X/%08X is not inside file \"%s\"", - LSN_FORMAT_ARGS(private.endptr), - argv[argc - 1]); - goto bad_argument; + fd = open_file_in_directory(waldir, fname); + if (fd < 0) + pg_fatal("could not open file \"%s\"", fname); + close(fd); + + /* parse position from file */ + XLogFromFileName(fname, &private.timeline, &endsegno, private.segsize); + + if (endsegno < segno) + pg_fatal("ENDSEG %s is before STARTSEG %s", + argv[optind + 1], argv[optind]); + + if (!XLogRecPtrIsValid(private.endptr)) + XLogSegNoOffsetToRecPtr(endsegno + 1, 0, private.segsize, + private.endptr); + + /* set segno to endsegno for check of --end */ + segno = endsegno; + } + + if (!XLByteInSeg(private.endptr, segno, private.segsize) && + private.endptr != (segno + 1) * private.segsize) + { + pg_log_error("end WAL location %X/%08X is not inside file \"%s\"", + LSN_FORMAT_ARGS(private.endptr), + argv[argc - 1]); + goto bad_argument; + } } } - else - waldir = identify_target_directory(waldir, NULL, &private.segsize); + else if (!private.archive_name) + waldir = identify_target_directory(walpath, NULL, &private.segsize); /* we don't know what to print */ if (!XLogRecPtrIsValid(private.startptr)) @@ -1209,15 +1341,46 @@ main(int argc, char **argv) goto bad_argument; } + /* --follow is not supported with tar archives */ + if (config.follow && private.archive_name) + { + pg_log_error("--follow is not supported when reading from a tar archive"); + goto bad_argument; + } + /* done with argument parsing, do the actual work */ /* we have everything we need, start reading */ - xlogreader_state = - XLogReaderAllocate(private.segsize, waldir, - XL_ROUTINE(.page_read = WALDumpReadPage, - .segment_open = WALDumpOpenSegment, - .segment_close = WALDumpCloseSegment), - &private); + if (private.archive_name) + { + /* + * A NULL WAL directory indicates that the archive file is located in + * the current working directory. + */ + if (waldir == NULL) + waldir = pg_strdup("."); + + /* Set up for reading tar file */ + init_archive_reader(&private, waldir, &private.segsize, compression); + + /* Routine to decode WAL files in tar archive */ + xlogreader_state = + XLogReaderAllocate(private.segsize, waldir, + XL_ROUTINE(.page_read = TarWALDumpReadPage, + .segment_open = TarWALDumpOpenSegment, + .segment_close = TarWALDumpCloseSegment), + &private); + } + else + { + xlogreader_state = + XLogReaderAllocate(private.segsize, waldir, + XL_ROUTINE(.page_read = WALDumpReadPage, + .segment_open = WALDumpOpenSegment, + .segment_close = WALDumpCloseSegment), + &private); + } + if (!xlogreader_state) pg_fatal("out of memory while allocating a WAL reading processor"); @@ -1245,6 +1408,9 @@ main(int argc, char **argv) if (config.stats == true && !config.quiet) stats.startptr = first_record; + /* Flag indicating that the decoding loop has been entered */ + private.decoding_started = true; + for (;;) { if (time_to_stop) @@ -1326,6 +1492,9 @@ main(int argc, char **argv) XLogReaderFree(xlogreader_state); + if (private.archive_name) + free_archive_reader(&private); + return EXIT_SUCCESS; bad_argument: diff --git a/src/bin/pg_waldump/pg_waldump.h b/src/bin/pg_waldump/pg_waldump.h index 013b051506f..fd25792b33a 100644 --- a/src/bin/pg_waldump/pg_waldump.h +++ b/src/bin/pg_waldump/pg_waldump.h @@ -12,6 +12,14 @@ #define PG_WALDUMP_H #include "access/xlogdefs.h" +#include "fe_utils/astreamer.h" + +/* Forward declaration */ +struct ArchivedWALFile; +struct ArchivedWAL_hash; + +/* Temporary directory for spilling out-of-order WAL segments from archives */ +extern char *TmpWalSegDir; /* Contains the necessary information to drive WAL decoding */ typedef struct XLogDumpPrivate @@ -21,6 +29,48 @@ typedef struct XLogDumpPrivate XLogRecPtr startptr; XLogRecPtr endptr; bool endptr_reached; + bool decoding_started; + + /* Fields required to read WAL from archive */ + char *archive_name; /* tar archive filename */ + int archive_fd; /* File descriptor for the open tar file */ + + astreamer *archive_streamer; + char *archive_read_buf; /* Reusable read buffer for archive I/O */ + +#ifdef USE_ASSERT_CHECKING + Size archive_read_buf_size; +#endif + + /* What the archive streamer is currently reading */ + struct ArchivedWALFile *cur_file; + + /* + * Hash table of all WAL files that the archive stream has read, including + * the one currently in progress. + */ + struct ArchivedWAL_hash *archive_wal_htab; + + /* + * Pre-computed segment numbers derived from startptr and endptr. Caching + * them avoids repeated XLByteToSeg() calls when filtering each archive + * member against the requested WAL range. + */ + XLogSegNo start_segno; + XLogSegNo end_segno; } XLogDumpPrivate; +extern int open_file_in_directory(const char *directory, const char *fname); + +extern void init_archive_reader(XLogDumpPrivate *privateInfo, + const char *waldir, int *WalSegSz, + pg_compress_algorithm compression); +extern void free_archive_reader(XLogDumpPrivate *privateInfo); +extern int read_archive_wal_page(XLogDumpPrivate *privateInfo, + XLogRecPtr targetPagePtr, + Size count, char *readBuff, + int WalSegSz); +extern void free_archive_wal_entry(const char *fname, + XLogDumpPrivate *privateInfo); + #endif /* PG_WALDUMP_H */ diff --git a/src/bin/pg_waldump/t/001_basic.pl b/src/bin/pg_waldump/t/001_basic.pl index 5db5d20136f..94c58187412 100644 --- a/src/bin/pg_waldump/t/001_basic.pl +++ b/src/bin/pg_waldump/t/001_basic.pl @@ -3,9 +3,13 @@ use strict; use warnings FATAL => 'all'; +use Cwd; use PostgreSQL::Test::Cluster; use PostgreSQL::Test::Utils; use Test::More; +use List::Util qw(shuffle); + +my $tar = $ENV{TAR}; program_help_ok('pg_waldump'); program_version_ok('pg_waldump'); @@ -162,6 +166,42 @@ CREATE TABLESPACE ts1 LOCATION '$tblspc_path'; DROP TABLESPACE ts1; }); +# Test: Decode a continuation record (contrecord) that spans multiple WAL +# segments. +# +# Now consume all remaining room in the current WAL segment, leaving +# space enough only for the start of a largish record. +$node->safe_psql( + 'postgres', q{ +DO $$ +DECLARE + wal_segsize int := setting::int FROM pg_settings WHERE name = 'wal_segment_size'; + remain int; + iters int := 0; +BEGIN + LOOP + INSERT into t1(b) + select repeat(encode(sha256(g::text::bytea), 'hex'), (random() * 15 + 1)::int) + from generate_series(1, 10) g; + + remain := wal_segsize - (pg_current_wal_insert_lsn() - '0/0') % wal_segsize; + IF remain < 2 * setting::int from pg_settings where name = 'block_size' THEN + RAISE log 'exiting after % iterations, % bytes to end of WAL segment', iters, remain; + EXIT; + END IF; + iters := iters + 1; + END LOOP; +END +$$; +}); + +my $contrecord_lsn = $node->safe_psql('postgres', + 'SELECT pg_current_wal_insert_lsn()'); +# Generate contrecord record +$node->safe_psql('postgres', + qq{SELECT pg_logical_emit_message(true, 'test 026', repeat('xyzxz', 123456))} +); + my ($end_lsn, $end_walfile) = split /\|/, $node->safe_psql('postgres', q{SELECT pg_current_wal_insert_lsn(), pg_walfile_name(pg_current_wal_insert_lsn())} @@ -198,28 +238,6 @@ command_like( ], qr/./, 'runs with start and end segment specified'); -command_fails_like( - [ 'pg_waldump', '--path' => $node->data_dir ], - qr/error: no start WAL location given/, - 'path option requires start location'); -command_like( - [ - 'pg_waldump', - '--path' => $node->data_dir, - '--start' => $start_lsn, - '--end' => $end_lsn, - ], - qr/./, - 'runs with path option and start and end locations'); -command_fails_like( - [ - 'pg_waldump', - '--path' => $node->data_dir, - '--start' => $start_lsn, - ], - qr/error: error in WAL record at/, - 'falling off the end of the WAL results in an error'); - command_like( [ 'pg_waldump', '--quiet', @@ -227,22 +245,16 @@ command_like( ], qr/^$/, 'no output with --quiet option'); -command_fails_like( - [ - 'pg_waldump', '--quiet', - '--path' => $node->data_dir, - '--start' => $start_lsn - ], - qr/error: error in WAL record at/, - 'errors are shown with --quiet'); - # Test for: Display a message that we're skipping data if `from` # wasn't a pointer to the start of a record. +sub test_pg_waldump_skip_bytes { + my ($path, $startlsn, $endlsn) = @_; + # Construct a new LSN that is one byte past the original # start_lsn. - my ($part1, $part2) = split qr{/}, $start_lsn; + my ($part1, $part2) = split qr{/}, $startlsn; my $lsn2 = hex $part2; $lsn2++; my $new_start = sprintf("%s/%X", $part1, $lsn2); @@ -252,7 +264,8 @@ command_fails_like( my $result = IPC::Run::run [ 'pg_waldump', '--start' => $new_start, - $node->data_dir . '/pg_wal/' . $start_walfile + '--end' => $endlsn, + '--path' => $path, ], '>' => \$stdout, '2>' => \$stderr; @@ -266,15 +279,15 @@ command_fails_like( sub test_pg_waldump { local $Test::Builder::Level = $Test::Builder::Level + 1; - my @opts = @_; + my ($path, $startlsn, $endlsn, @opts) = @_; my ($stdout, $stderr); my $result = IPC::Run::run [ 'pg_waldump', - '--path' => $node->data_dir, - '--start' => $start_lsn, - '--end' => $end_lsn, + '--start' => $startlsn, + '--end' => $endlsn, + '--path' => $path, @opts ], '>' => \$stdout, @@ -286,40 +299,145 @@ sub test_pg_waldump return @lines; } -my @lines; +# Create a tar archive, shuffle the file order +sub generate_archive +{ + my ($archive, $directory, $compression_flags) = @_; -@lines = test_pg_waldump; -is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); + my @files; + opendir my $dh, $directory or die "opendir: $!"; + while (my $entry = readdir $dh) { + # Skip '.' and '..' + next if $entry eq '.' || $entry eq '..'; + push @files, $entry; + } + closedir $dh; -@lines = test_pg_waldump('--limit' => 6); -is(@lines, 6, 'limit option observed'); + @files = shuffle @files; -@lines = test_pg_waldump('--fullpage'); -is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW'); + # move into the WAL directory before archiving files + my $cwd = getcwd; + chdir($directory) || die "chdir: $!"; + command_ok([$tar, $compression_flags, $archive, @files]); + chdir($cwd) || die "chdir: $!"; +} -@lines = test_pg_waldump('--stats'); -like($lines[0], qr/WAL statistics/, "statistics on stdout"); -is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); +my $tmp_dir = PostgreSQL::Test::Utils::tempdir_short(); -@lines = test_pg_waldump('--stats=record'); -like($lines[0], qr/WAL statistics/, "statistics on stdout"); -is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); +my @scenarios = ( + { + 'path' => $node->data_dir, + 'is_archive' => 0, + 'enabled' => 1 + }, + { + 'path' => "$tmp_dir/pg_wal.tar", + 'compression_method' => 'none', + 'compression_flags' => '-cf', + 'is_archive' => 1, + 'enabled' => 1 + }, + { + 'path' => "$tmp_dir/pg_wal.tar.gz", + 'compression_method' => 'gzip', + 'compression_flags' => '-czf', + 'is_archive' => 1, + 'enabled' => check_pg_config("#define HAVE_LIBZ 1") + }); -@lines = test_pg_waldump('--rmgr' => 'Btree'); -is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines'); +for my $scenario (@scenarios) +{ + my $path = $scenario->{'path'}; -@lines = test_pg_waldump('--fork' => 'init'); -is(grep(!/fork init/, @lines), 0, 'only init fork lines'); + SKIP: + { + skip "tar command is not available", 56 + if !defined $tar && $scenario->{'is_archive'}; + skip "$scenario->{'compression_method'} compression not supported by this build", 56 + if !$scenario->{'enabled'} && $scenario->{'is_archive'}; -@lines = test_pg_waldump( - '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid"); -is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines), - 0, 'only lines for selected relation'); + # create pg_wal archive + if ($scenario->{'is_archive'}) + { + generate_archive($path, + $node->data_dir . '/pg_wal', + $scenario->{'compression_flags'}); + } -@lines = test_pg_waldump( - '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid", - '--block' => 1); -is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block'); + command_fails_like( + [ 'pg_waldump', '--path' => $path ], + qr/error: no start WAL location given/, + 'path option requires start location'); + command_like( + [ + 'pg_waldump', + '--path' => $path, + '--start' => $start_lsn, + '--end' => $end_lsn, + ], + qr/./, + 'runs with path option and start and end locations'); + command_fails_like( + [ + 'pg_waldump', + '--path' => $path, + '--start' => $start_lsn, + ], + qr/error: error in WAL record at/, + 'falling off the end of the WAL results in an error'); + command_fails_like( + [ + 'pg_waldump', '--quiet', + '--path' => $path, + '--start' => $start_lsn + ], + qr/error: error in WAL record at/, + 'errors are shown with --quiet'); + + test_pg_waldump_skip_bytes($path, $start_lsn, $end_lsn); + + my @lines = test_pg_waldump($path, $start_lsn, $end_lsn); + is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); + + @lines = test_pg_waldump($path, $contrecord_lsn, $end_lsn); + is(grep(!/^rmgr: \w/, @lines), 0, 'all output lines are rmgr lines'); + + test_pg_waldump_skip_bytes($path, $contrecord_lsn, $end_lsn); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--limit' => 6); + is(@lines, 6, 'limit option observed'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fullpage'); + is(grep(!/^rmgr:.*\bFPW\b/, @lines), 0, 'all output lines are FPW'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats'); + like($lines[0], qr/WAL statistics/, "statistics on stdout"); + is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--stats=record'); + like($lines[0], qr/WAL statistics/, "statistics on stdout"); + is(grep(/^rmgr:/, @lines), 0, 'no rmgr lines output'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--rmgr' => 'Btree'); + is(grep(!/^rmgr: Btree/, @lines), 0, 'only Btree lines'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, '--fork' => 'init'); + is(grep(!/fork init/, @lines), 0, 'only init fork lines'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, + '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_t1_oid"); + is(grep(!/rel $default_ts_oid\/$postgres_db_oid\/$rel_t1_oid/, @lines), + 0, 'only lines for selected relation'); + + @lines = test_pg_waldump($path, $start_lsn, $end_lsn, + '--relation' => "$default_ts_oid/$postgres_db_oid/$rel_i1a_oid", + '--block' => 1); + is(grep(!/\bblk 1\b/, @lines), 0, 'only lines for selected block'); + + # Cleanup. + unlink $path if $scenario->{'is_archive'}; + } +} done_testing(); diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list index 174e2798443..3e2fc711a3e 100644 --- a/src/tools/pgindent/typedefs.list +++ b/src/tools/pgindent/typedefs.list @@ -147,6 +147,9 @@ ArchiveOpts ArchiveShutdownCB ArchiveStartupCB ArchiveStreamState +ArchivedWALFile +ArchivedWAL_hash +ArchivedWAL_iterator ArchiverOutput ArchiverStage ArrayAnalyzeExtraData @@ -3542,6 +3545,7 @@ astreamer_recovery_injector astreamer_tar_archiver astreamer_tar_parser astreamer_verify +astreamer_waldump astreamer_zstd_frame auth_password_hook_typ autovac_table -- 2.47.1
From cccb149f57bd3323506d459d005c7552c19aa07d Mon Sep 17 00:00:00 2001 From: Andrew Dunstan <[email protected]> Date: Thu, 19 Mar 2026 15:43:53 +0530 Subject: [PATCH v20 4/5] pg_verifybackup: Enable WAL parsing for tar-format backups Now that pg_waldump supports reading WAL from tar archives, remove the restriction that forced --no-parse-wal for tar-format backups. pg_verifybackup now automatically locates the WAL archive: it looks for a separate pg_wal.tar first, then falls back to the main base.tar. A new --wal-path option (replacing the old --wal-directory, which is kept as a silent alias) accepts either a directory or a tar archive path. The default WAL directory preparation is deferred until the backup format is known, since tar-format backups resolve the WAL path differently from plain-format ones. Author: Amul Sul <[email protected]> Reviewed-by: Robert Haas <[email protected]> Reviewed-by: Jakub Wartak <[email protected]> Reviewed-by: Chao Li <[email protected]> Reviewed-by: Euler Taveira <[email protected]> Reviewed-by: Andrew Dunstan <[email protected]> discussion: https://postgr.es/m/CAAJ_b94bqdWN3h2J-PzzzQ2Npbwct5ZQHggn_QoYGhC2rn-=WQ@mail.gmail.com --- doc/src/sgml/ref/pg_verifybackup.sgml | 14 ++- src/bin/pg_verifybackup/pg_verifybackup.c | 96 ++++++++++++------- src/bin/pg_verifybackup/t/002_algorithm.pl | 4 - src/bin/pg_verifybackup/t/003_corruption.pl | 4 +- src/bin/pg_verifybackup/t/007_wal.pl | 20 +++- src/bin/pg_verifybackup/t/008_untar.pl | 5 +- src/bin/pg_verifybackup/t/010_client_untar.pl | 5 +- 7 files changed, 91 insertions(+), 57 deletions(-) diff --git a/doc/src/sgml/ref/pg_verifybackup.sgml b/doc/src/sgml/ref/pg_verifybackup.sgml index 61c12975e4a..1695cfe91c8 100644 --- a/doc/src/sgml/ref/pg_verifybackup.sgml +++ b/doc/src/sgml/ref/pg_verifybackup.sgml @@ -36,10 +36,7 @@ PostgreSQL documentation <literal>backup_manifest</literal> generated by the server at the time of the backup. The backup may be stored either in the "plain" or the "tar" format; this includes tar-format backups compressed with any algorithm - supported by <application>pg_basebackup</application>. However, at present, - <literal>WAL</literal> verification is supported only for plain-format - backups. Therefore, if the backup is stored in tar-format, the - <literal>-n, --no-parse-wal</literal> option should be used. + supported by <application>pg_basebackup</application>. </para> <para> @@ -261,12 +258,13 @@ PostgreSQL documentation <varlistentry> <term><option>-w <replaceable class="parameter">path</replaceable></option></term> - <term><option>--wal-directory=<replaceable class="parameter">path</replaceable></option></term> + <term><option>--wal-path=<replaceable class="parameter">path</replaceable></option></term> <listitem> <para> - Try to parse WAL files stored in the specified directory, rather than - in <literal>pg_wal</literal>. This may be useful if the backup is - stored in a separate location from the WAL archive. + Try to parse WAL files stored in the specified directory or tar + archive, rather than in <literal>pg_wal</literal>. This may be + useful if the backup is stored in a separate location from the WAL + archive. </para> </listitem> </varlistentry> diff --git a/src/bin/pg_verifybackup/pg_verifybackup.c b/src/bin/pg_verifybackup/pg_verifybackup.c index 31f606c45b1..b60ab8739d5 100644 --- a/src/bin/pg_verifybackup/pg_verifybackup.c +++ b/src/bin/pg_verifybackup/pg_verifybackup.c @@ -74,7 +74,9 @@ pg_noreturn static void report_manifest_error(JsonManifestParseContext *context, const char *fmt,...) pg_attribute_printf(2, 3); -static void verify_tar_backup(verifier_context *context, DIR *dir); +static void verify_tar_backup(verifier_context *context, DIR *dir, + char **base_archive_path, + char **wal_archive_path); static void verify_plain_backup_directory(verifier_context *context, char *relpath, char *fullpath, DIR *dir); @@ -83,7 +85,9 @@ static void verify_plain_backup_file(verifier_context *context, char *relpath, static void verify_control_file(const char *controlpath, uint64 manifest_system_identifier); static void precheck_tar_backup_file(verifier_context *context, char *relpath, - char *fullpath, SimplePtrList *tarfiles); + char *fullpath, SimplePtrList *tarfiles, + char **base_archive_path, + char **wal_archive_path); static void verify_tar_file(verifier_context *context, char *relpath, char *fullpath, astreamer *streamer); static void report_extra_backup_files(verifier_context *context); @@ -93,7 +97,7 @@ static void verify_file_checksum(verifier_context *context, uint8 *buffer); static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory); + char *wal_path); static astreamer *create_archive_verifier(verifier_context *context, char *archive_name, Oid tblspc_oid, @@ -126,7 +130,8 @@ main(int argc, char **argv) {"progress", no_argument, NULL, 'P'}, {"quiet", no_argument, NULL, 'q'}, {"skip-checksums", no_argument, NULL, 's'}, - {"wal-directory", required_argument, NULL, 'w'}, + {"wal-path", required_argument, NULL, 'w'}, + {"wal-directory", required_argument, NULL, 'w'}, /* deprecated */ {NULL, 0, NULL, 0} }; @@ -135,7 +140,9 @@ main(int argc, char **argv) char *manifest_path = NULL; bool no_parse_wal = false; bool quiet = false; - char *wal_directory = NULL; + char *wal_path = NULL; + char *base_archive_path = NULL; + char *wal_archive_path = NULL; char *pg_waldump_path = NULL; DIR *dir; @@ -221,8 +228,8 @@ main(int argc, char **argv) context.skip_checksums = true; break; case 'w': - wal_directory = pstrdup(optarg); - canonicalize_path(wal_directory); + wal_path = pstrdup(optarg); + canonicalize_path(wal_path); break; default: /* getopt_long already emitted a complaint */ @@ -285,10 +292,6 @@ main(int argc, char **argv) manifest_path = psprintf("%s/backup_manifest", context.backup_directory); - /* By default, look for the WAL in the backup directory, too. */ - if (wal_directory == NULL) - wal_directory = psprintf("%s/pg_wal", context.backup_directory); - /* * Try to read the manifest. We treat any errors encountered while parsing * the manifest as fatal; there doesn't seem to be much point in trying to @@ -331,17 +334,6 @@ main(int argc, char **argv) pfree(path); } - /* - * XXX: In the future, we should consider enhancing pg_waldump to read WAL - * files from an archive. - */ - if (!no_parse_wal && context.format == 't') - { - pg_log_error("pg_waldump cannot read tar files"); - pg_log_error_hint("You must use -n/--no-parse-wal when verifying a tar-format backup."); - exit(1); - } - /* * Perform the appropriate type of verification appropriate based on the * backup format. This will close 'dir'. @@ -350,7 +342,7 @@ main(int argc, char **argv) verify_plain_backup_directory(&context, NULL, context.backup_directory, dir); else - verify_tar_backup(&context, dir); + verify_tar_backup(&context, dir, &base_archive_path, &wal_archive_path); /* * The "matched" flag should now be set on every entry in the hash table. @@ -368,12 +360,35 @@ main(int argc, char **argv) if (context.format == 'p' && !context.skip_checksums) verify_backup_checksums(&context); + /* + * By default, WAL files are expected to be found in the backup directory + * for plain-format backups. In the case of tar-format backups, if a + * separate WAL archive is not found, the WAL files are most likely + * included within the main data directory archive. + */ + if (wal_path == NULL) + { + if (context.format == 'p') + wal_path = psprintf("%s/pg_wal", context.backup_directory); + else if (wal_archive_path) + wal_path = wal_archive_path; + else if (base_archive_path) + wal_path = base_archive_path; + else + { + pg_log_error("WAL archive not found"); + pg_log_error_hint("Specify the correct path using the option -w/--wal-path. " + "Or you must use -n/--no-parse-wal when verifying a tar-format backup."); + exit(1); + } + } + /* * Try to parse the required ranges of WAL records, unless we were told * not to do so. */ if (!no_parse_wal) - parse_required_wal(&context, pg_waldump_path, wal_directory); + parse_required_wal(&context, pg_waldump_path, wal_path); /* * If everything looks OK, tell the user this, unless we were asked to @@ -787,7 +802,8 @@ verify_control_file(const char *controlpath, uint64 manifest_system_identifier) * close when we're done with it. */ static void -verify_tar_backup(verifier_context *context, DIR *dir) +verify_tar_backup(verifier_context *context, DIR *dir, char **base_archive_path, + char **wal_archive_path) { struct dirent *dirent; SimplePtrList tarfiles = {NULL, NULL}; @@ -816,7 +832,8 @@ verify_tar_backup(verifier_context *context, DIR *dir) char *fullpath; fullpath = psprintf("%s/%s", context->backup_directory, filename); - precheck_tar_backup_file(context, filename, fullpath, &tarfiles); + precheck_tar_backup_file(context, filename, fullpath, &tarfiles, + base_archive_path, wal_archive_path); pfree(fullpath); } } @@ -875,17 +892,21 @@ verify_tar_backup(verifier_context *context, DIR *dir) * * The arguments to this function are mostly the same as the * verify_plain_backup_file. The additional argument outputs a list of valid - * tar files. + * tar files, along with the full paths to the main archive and the WAL + * directory archive. */ static void precheck_tar_backup_file(verifier_context *context, char *relpath, - char *fullpath, SimplePtrList *tarfiles) + char *fullpath, SimplePtrList *tarfiles, + char **base_archive_path, char **wal_archive_path) { struct stat sb; Oid tblspc_oid = InvalidOid; pg_compress_algorithm compress_algorithm; tar_file *tar; char *suffix = NULL; + bool is_base_archive = false; + bool is_wal_archive = false; /* Should be tar format backup */ Assert(context->format == 't'); @@ -918,9 +939,15 @@ precheck_tar_backup_file(verifier_context *context, char *relpath, * extension such as .gz, .lz4, or .zst. */ if (strncmp("base", relpath, 4) == 0) + { suffix = relpath + 4; + is_base_archive = true; + } else if (strncmp("pg_wal", relpath, 6) == 0) + { suffix = relpath + 6; + is_wal_archive = true; + } else { /* Expected a <tablespaceoid>.tar file here. */ @@ -953,8 +980,13 @@ precheck_tar_backup_file(verifier_context *context, char *relpath, * Ignore WALs, as reading and verification will be handled through * pg_waldump. */ - if (strncmp("pg_wal", relpath, 6) == 0) + if (is_wal_archive) + { + *wal_archive_path = pstrdup(fullpath); return; + } + else if (is_base_archive) + *base_archive_path = pstrdup(fullpath); /* * Append the information to the list for complete verification at a later @@ -1188,7 +1220,7 @@ verify_file_checksum(verifier_context *context, manifest_file *m, */ static void parse_required_wal(verifier_context *context, char *pg_waldump_path, - char *wal_directory) + char *wal_path) { manifest_data *manifest = context->manifest; manifest_wal_range *this_wal_range = manifest->first_wal_range; @@ -1198,7 +1230,7 @@ parse_required_wal(verifier_context *context, char *pg_waldump_path, char *pg_waldump_cmd; pg_waldump_cmd = psprintf("\"%s\" --quiet --path=\"%s\" --timeline=%u --start=%X/%08X --end=%X/%08X\n", - pg_waldump_path, wal_directory, this_wal_range->tli, + pg_waldump_path, wal_path, this_wal_range->tli, LSN_FORMAT_ARGS(this_wal_range->start_lsn), LSN_FORMAT_ARGS(this_wal_range->end_lsn)); fflush(NULL); @@ -1366,7 +1398,7 @@ usage(void) printf(_(" -P, --progress show progress information\n")); printf(_(" -q, --quiet do not print any output, except for errors\n")); printf(_(" -s, --skip-checksums skip checksum verification\n")); - printf(_(" -w, --wal-directory=PATH use specified path for WAL files\n")); + printf(_(" -w, --wal-path=PATH use specified path for WAL files\n")); printf(_(" -V, --version output version information, then exit\n")); printf(_(" -?, --help show this help, then exit\n")); printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT); diff --git a/src/bin/pg_verifybackup/t/002_algorithm.pl b/src/bin/pg_verifybackup/t/002_algorithm.pl index 0556191ec9d..edc515d5904 100644 --- a/src/bin/pg_verifybackup/t/002_algorithm.pl +++ b/src/bin/pg_verifybackup/t/002_algorithm.pl @@ -30,10 +30,6 @@ sub test_checksums { # Add switch to get a tar-format backup push @backup, ('--format' => 'tar'); - - # Add switch to skip WAL verification, which is not yet supported for - # tar-format backups - push @verify, ('--no-parse-wal'); } # A backup with a bogus algorithm should fail. diff --git a/src/bin/pg_verifybackup/t/003_corruption.pl b/src/bin/pg_verifybackup/t/003_corruption.pl index b1d65b8aa0f..882d75d9dc2 100644 --- a/src/bin/pg_verifybackup/t/003_corruption.pl +++ b/src/bin/pg_verifybackup/t/003_corruption.pl @@ -193,10 +193,8 @@ for my $scenario (@scenario) command_ok([ $tar, '-cf' => "$tar_backup_path/base.tar", '.' ]); chdir($cwd) || die "chdir: $!"; - # Now check that the backup no longer verifies. We must use -n - # here, because pg_waldump can't yet read WAL from a tarfile. command_fails_like( - [ 'pg_verifybackup', '--no-parse-wal', $tar_backup_path ], + [ 'pg_verifybackup', $tar_backup_path ], $scenario->{'fails_like'}, "corrupt backup fails verification: $name"); diff --git a/src/bin/pg_verifybackup/t/007_wal.pl b/src/bin/pg_verifybackup/t/007_wal.pl index 79087a1f6be..0e0377bfacc 100644 --- a/src/bin/pg_verifybackup/t/007_wal.pl +++ b/src/bin/pg_verifybackup/t/007_wal.pl @@ -42,10 +42,10 @@ command_ok([ 'pg_verifybackup', '--no-parse-wal', $backup_path ], command_ok( [ 'pg_verifybackup', - '--wal-directory' => $relocated_pg_wal, + '--wal-path' => $relocated_pg_wal, $backup_path ], - '--wal-directory can be used to specify WAL directory'); + '--wal-path can be used to specify WAL directory'); # Move directory back to original location. rename($relocated_pg_wal, $original_pg_wal) || die "rename pg_wal back: $!"; @@ -90,4 +90,20 @@ command_ok( [ 'pg_verifybackup', $backup_path2 ], 'valid base backup with timeline > 1'); +# Test WAL verification for a tar-format backup with a separate pg_wal.tar, +# as produced by pg_basebackup --format=tar --wal-method=stream. +my $backup_path3 = $primary->backup_dir . '/test_tar_wal'; +$primary->command_ok( + [ + 'pg_basebackup', + '--pgdata' => $backup_path3, + '--no-sync', + '--format' => 'tar', + '--checkpoint' => 'fast' + ], + "tar backup with separate pg_wal.tar"); +command_ok( + [ 'pg_verifybackup', $backup_path3 ], + 'WAL verification succeeds with separate pg_wal.tar'); + done_testing(); diff --git a/src/bin/pg_verifybackup/t/008_untar.pl b/src/bin/pg_verifybackup/t/008_untar.pl index ae67ae85a31..161c08c190d 100644 --- a/src/bin/pg_verifybackup/t/008_untar.pl +++ b/src/bin/pg_verifybackup/t/008_untar.pl @@ -47,7 +47,6 @@ my $tsoid = $primary->safe_psql( SELECT oid FROM pg_tablespace WHERE spcname = 'regress_ts1')); my $backup_path = $primary->backup_dir . '/server-backup'; -my $extract_path = $primary->backup_dir . '/extracted-backup'; my @test_configuration = ( { @@ -123,14 +122,12 @@ for my $tc (@test_configuration) # Verify tar backup. $primary->command_ok( [ - 'pg_verifybackup', '--no-parse-wal', - '--exit-on-error', $backup_path, + 'pg_verifybackup', '--exit-on-error', $backup_path, ], "verify backup, compression $method"); # Cleanup. rmtree($backup_path); - rmtree($extract_path); } } diff --git a/src/bin/pg_verifybackup/t/010_client_untar.pl b/src/bin/pg_verifybackup/t/010_client_untar.pl index 1ac7b5db75a..9670fbe4fda 100644 --- a/src/bin/pg_verifybackup/t/010_client_untar.pl +++ b/src/bin/pg_verifybackup/t/010_client_untar.pl @@ -32,7 +32,6 @@ print $jf $junk_data; close $jf; my $backup_path = $primary->backup_dir . '/client-backup'; -my $extract_path = $primary->backup_dir . '/extracted-backup'; my @test_configuration = ( { @@ -137,13 +136,11 @@ for my $tc (@test_configuration) # Verify tar backup. $primary->command_ok( [ - 'pg_verifybackup', '--no-parse-wal', - '--exit-on-error', $backup_path, + 'pg_verifybackup', '--exit-on-error', $backup_path, ], "verify backup, compression $method"); # Cleanup. - rmtree($extract_path); rmtree($backup_path); } } -- 2.47.1
