Unexpected result count from update statement on partitioned table

2020-12-17 Thread Craig McIlwee
Hello,

Our application uses a queue-like table to assign tasks to users and this
has worked well for us for a few years.  Now we are in the process of
adding some restrictions to which tasks a user can work on and that is
based on an attribute of each task that does not change for the task's
lifespan.  Users may have access to work on one or more or types of tasks.
To improve query time when finding the set of tasks that we assign, we are
introducing partitioning into our task queue table.  When assigning tasks,
we issue an update statement to mark the tasks as reserved using a subquery
that orders the tasks by age.  With the introduction of partitioning, we
are seeing that the update statement affects more rows than expected.  An
example query is:

---
update task_parent
set reserved = true
from (
  select id
  from task_parent
  where reserved = false
and task_type = 1 or task_type = 2
  order by task_timestamp
  limit 50
  for update skip locked) as sub
where sub.id = task_parent.id
returning task_parent.id
---

In the statement above, we have a subquery to limit the number of tasks to
50 yet the update statement sometimes returns more than 50 records.  I have
narrowed this down to a small, reproducible example shown below.  The first
time I run the update statement I get ~65 records, then typically ~53 the
next few runs, and then it starts consistently giving me 50 records after
that.  Then if I bump the limit to 100 I will get more than 100 initially
and after several executions it starts to settle into always giving the
expected 100.

Below is the full setup that can be used to reproduce what I'm seeing.  It
was initially observed on PostgreSQL 11.8 but I can also reproduce it on
13.0.

---
create table task_parent (
  id bigint not null,
  task_type smallint not null,
  reserved boolean not null,
  task_timestamp timestamp not null
) partition by list (task_type);

create table task_child_1
partition of task_parent for values in (1);

create table task_child_2
partition of task_parent for values in (2);

insert into task_parent
select
  generate_series(1, 50),
  case when random() < 0.5 then 1 else 2 end,
  false,
  now() - (random() * '1 day'::interval);

create index task_parent_task_time_idx
on task_parent (task_timestamp);

update task_parent
set reserved = true
from (
  select id
  from task_parent
  where reserved = false
and task_type = 1 or task_type = 2
  order by task_timestamp
  limit 50
  for update skip locked) as sub
where sub.id = task_parent.id
returning task_parent.id;
---

A couple of interesting observations:
1) If I remove the order by clause I always get the expected number of
results
2) If I rewrite the query to use a CTE for the task IDs instead of a
subquery then I always get the expected number of results

At its surface, this seems like it could be a bug but maybe there is
something about this usage pattern that is known/expected to cause this
behavior.  So that's the question - is this a bug that should be reported
to pgsql-bugs, or is this expected and if so, why?

Craig


Re: Unexpected result count from update statement on partitioned table

2020-12-18 Thread Craig McIlwee
t min(task_timestamp), max(task_timestamp)
from updated;

Execution 1:
min |max
+
 2020-12-17 11:44:51.192119 | 2020-12-17 11:45:03.881409

Execution 2:
min |max
+--------
 2020-12-17 11:44:59.943108 | 2020-12-17 11:45:14.273185

min of execution 2 is older than max of execution 1 which should not happen.

Craig

On Fri, Dec 18, 2020 at 4:06 AM Laurenz Albe 
wrote:

> On Thu, 2020-12-17 at 12:21 -0500, Craig McIlwee wrote:
> > Our application uses a queue-like table to assign tasks to users and
> this has worked well for us for a few years.  Now we are in the process of
> adding some restrictions to which tasks a user can
> > work on and that is based on an attribute of each task that does not
> change for the task's lifespan.  Users may have access to work on one or
> more or types of tasks.  To improve query time when
> > finding the set of tasks that we assign, we are introducing partitioning
> into our task queue table.  When assigning tasks, we issue an update
> statement to mark the tasks as reserved using a subquery
> > that orders the tasks by age.  With the introduction of partitioning, we
> are seeing that the update statement affects more rows than expected.  An
> example query is:
> >
> > ---
> > update task_parent
> > set reserved = true
> > from (
> >   select id
> >   from task_parent
> >   where reserved = false
> > and task_type = 1 or task_type = 2
> >   order by task_timestamp
> >   limit 50
> >   for update skip locked) as sub
> > where sub.id = task_parent.id
> > returning task_parent.id
> > ---
> >
> > In the statement above, we have a subquery to limit the number of tasks
> to 50 yet the update statement sometimes returns more than 50 records.  I
> have narrowed this down to a small, reproducible
> > example shown below.  The first time I run the update statement I get
> ~65 records, then typically ~53 the next few runs, and then it starts
> consistently giving me 50 records after that.  Then if I
> > bump the limit to 100 I will get more than 100 initially and after
> several executions it starts to settle into always giving the expected 100.
> >
> > Below is the full setup that can be used to reproduce what I'm seeing.
> It was initially observed on PostgreSQL 11.8 but I can also reproduce it on
> 13.0.
> >
> > ---
> > create table task_parent (
> >   id bigint not null,
> >   task_type smallint not null,
> >   reserved boolean not null,
> >   task_timestamp timestamp not null
> > ) partition by list (task_type);
> >
> > create table task_child_1
> > partition of task_parent for values in (1);
> >
> > create table task_child_2
> > partition of task_parent for values in (2);
> >
> > insert into task_parent
> > select
> >   generate_series(1, 50),
> >   case when random() < 0.5 then 1 else 2 end,
> >   false,
> >   now() - (random() * '1 day'::interval);
> >
> > create index task_parent_task_time_idx
> > on task_parent (task_timestamp);
> >
> > update task_parent
> > set reserved = true
> > from (
> >   select id
> >   from task_parent
> >   where reserved = false
> > and task_type = 1 or task_type = 2
> >   order by task_timestamp
> >   limit 50
> >   for update skip locked) as sub
> > where sub.id = task_parent.id
> > returning task_parent.id;
> > ---
> >
> > A couple of interesting observations:
> > 1) If I remove the order by clause I always get the expected number of
> results
> > 2) If I rewrite the query to use a CTE for the task IDs instead of a
> subquery then I always get the expected number of results
> >
> > At its surface, this seems like it could be a bug but maybe there is
> something about this usage pattern that is known/expected to cause this
> behavior.  So that's the question - is this a bug that
> > should be reported to pgsql-bugs, or is this expected and if so, why?
>
> Yes, this must be a bug:
>
> EXPLAIN (COSTS OFF) update task_parent
> set reserved = true
> from (
>   select id
>   from task_parent
>   where reserved = false
> and task_type = 1 or task_type = 2
>   order by task_timestamp
>   limit 50
>   for update skip locked) as sub
> where sub.id = task_parent.id
> returning task_parent.id;
>
> QUERY PLAN
>
>
> 

Re: Database upgrade Pg11 to Pg12 issues

2021-02-03 Thread Craig McIlwee
(replying to the entire list instead of Bruce only this time...)


> This doesn't make sense to me.  Since we hard-linked, why would 12 be so
> much smaller?  If it was symlinks, I could imaging that, but it doesn't
> use symlinks, just hard links, so it should be similar.  Please look at
> the size of main/base on both, since that is where your data is.  Maybe
> 11 just has a lot of old WAL that isn't copied.
>
>
It depends on how the directory sizes were measured. "du" won't count the
same file more than once, so if space was measured using one of the
following commands:

du -hs /var/lib/postgresql/
du -hs /var/lib/postgresql/11/main /var/lib/postgresql/12/main

Then you would see a large value for the 11 directory and then the 12
directory would only show the files that weren't already counted when
measuring the 11 directory.  Running du on each version's directory
separately might give results that are more in line with expectations.

Craig


Re: Right version of jdbc

2023-09-29 Thread Craig McIlwee
> for some reason the postgresql jar is not in the classpath.

This is due to the way that Tomcat loads drivers, which is documented at
[1].  In short, the JDBC driver should be placed in the tomcat/lib
directory and removed from the application's WEB-INF/lib directory.  After
doing that, I was able to get past the "No suitable driver" exception.

OP, you should read that entire page, and in particular the portion at
[2].  By registering a "Resource" with Tomcat, you can let it manage a
connection pool for you (if you aren't already using something like
PgBouncer or pgpool) and, if registered at the container level, in the
future you can share that pool across all of your web applications in the
container.

[1]
https://tomcat.apache.org/tomcat-9.0-doc/jndi-datasource-examples-howto.html#DriverManager,_the_service_provider_mechanism_and_memory_leaks
[2]
https://tomcat.apache.org/tomcat-9.0-doc/jndi-datasource-examples-howto.html#PostgreSQL

Craig

>


Understanding max_locks_per_transaction

2023-10-16 Thread Craig McIlwee
We're using PostgreSQL 13.10, installed on CentOS 7 from PGDG RPMs.

Recently we've run into "out of shared memory" issues with a hint at
increasing max_locks_per_transaction.  The problem is well described in the
PostgreSQL documentation and various blog posts found around the internet,
and the solution is straightforward - touch fewer objects per transaction
or increase the size of the lock table.  The error occurs when joining
partitioned tables, and changing the query structure is something we'd like
to avoid, so we are going the route of increasing the size of the lock
table by increasing max_locks_per_transaction.

Many blog posts suggest against using an arbitrarily large value for
max_locks_per_transaction to avoid excess memory usage by the lock table.
Contrary to that is an email from Tom Lane [1] indicating that a lock table
with several million slots wouldn't be so bad.  A SO answer from Laurenz
Albe [2] tells us that a lock entry consumes 168 bytes so a table with 1m
slots would consume somewhere around 160MB (assuming we are on the same
architecture, but if not then at least still probably within an order of
magnitude).  I suppose I could set the max_locks_per_transaction value
really high, but, on the other hand, if I can make a pretty good estimate
about how many locks are actually needed then I can set a lower value and
let some of that RAM be used for other things like disk caching instead.
So here I am, trying to estimate the amount of locks that would actually be
needed but have some questions and am running into some behavior that I do
not expect.

Most discussions regarding the lock table say that the size of the lock
table determines how many locks can be held.  The documentation for
max_locks_per_transaction [3] reads slightly different though, and in
particular this phrases stands out to me:

> no more than this many distinct objects can be locked at any one time

To me, that seems to be saying that multiple locks for the same object
(e.g. for a single table) would only consume a single lock table entry.
Finally on to my first question: Am I interpreting the documentation
correctly, that multiple locks for the same object only consume a single
lock table entry, or am I reading too much into this and the size of the
lock table really does dictate the total number of locks regardless of
whether those locks point to the same object?  If my interpretation is
correct, then I can size the lock table to be slightly above the total
number of database objects and I should be safe.

I tried to test my understanding by creating two database connections and
in each connection issuing a query that touched a large number of tables
(enough to almost reach the size of the lock table).  My theory was that if
multiple locks for the same object only consume a single lock table entry,
then multiple connections could each issue the same big query that used a
large number of locks and no issue would arise.  For example, with the
default settings (max_connections = 100, max_prepared_statements = 0,
max_locks_per_transaction = 64) I should have a lock table with 6400 slots
so two connections that both touch 6000 database objects should be able to
run concurrently.  That test succeeded, but led me to the next
observation...

To my surprise, during my testing I was able to issue queries that used
well above the maximum number of locks that I was expecting and those
queries completed without issue.  Below are the relevant database settings
and the results of a statement that locks many objects.

archive=# show max_connections ;
 max_connections
-
 100
(1 row)
archive=# show max_prepared_transactions ;
 max_prepared_transactions
---
 0
(1 row)
archive=# show max_locks_per_transaction ;
 max_locks_per_transaction
---
 64
(1 row)
archive=# begin;
BEGIN
archive=*# explain 

archive=*# select count(*) as total, count(*) filter (where granted) as
granted from pg_locks;
 total | granted
---+-
  7568 |7568
(1 row)
archive=*# select count(distinct relation) from pg_locks ;
 count
---
  7576
(1 row)

According to the documentation, I should have a lock table with 6400
entries but somehow have been able to obtain 7576 locks.  So my second and
last question: how is this possible - shouldn't I have received an "out of
shared memory" error since I exceeded 6400 distinct locked objects?

[1] https://www.postgresql.org/message-id/25925.1669420...@sss.pgh.pa.us
[2] https://stackoverflow.com/a/65084207/2934470
[3]
https://www.postgresql.org/docs/current/runtime-config-locks.html#GUC-MAX-LOCKS-PER-TRANSACTION

Craig


Re: Understanding max_locks_per_transaction

2023-10-16 Thread Craig McIlwee
On Mon, Oct 16, 2023 at 2:32 PM Tom Lane  wrote:

> Craig McIlwee  writes:
> > Most discussions regarding the lock table say that the size of the lock
> > table determines how many locks can be held.  The documentation for
> > max_locks_per_transaction [3] reads slightly different though, and in
> > particular this phrases stands out to me:
>
> >> no more than this many distinct objects can be locked at any one time
>
> > To me, that seems to be saying that multiple locks for the same object
> > (e.g. for a single table) would only consume a single lock table entry.
> > Finally on to my first question: Am I interpreting the documentation
> > correctly, that multiple locks for the same object only consume a single
> > lock table entry,
>
> Yes ... however it's a good deal more complicated than that.
>
> What actually happens under the hood is that we allocate enough shared
> memory space for (MaxBackends + max_prepared_transactions) *
> max_locks_per_transaction LOCK structs (which are the per-locked-object
> entries) and twice that many PROCLOCK structs, which are
> per-lock-per-holder information.  The 2X multiplier assumes that on
> average about two sessions will be holding/requesting locks on any
> specific locked object.
>

I think that explains why I could successfully issue the same large
statement from 2 concurrent transactions but then encounter an error when
trying a third.


>
> Now, MaxBackends is more than max_connections, because it also
> accounts for autovacuum workers, parallel workers, etc.  So that's
> one of the sources of the fuzzy limit you noticed.  The other source
> is that we allocate about 100K more shared memory space than we think
> we need, and it's possible for the lock tables to expand into that
> "slop" space.  I've not checked the sizes of these structs lately,
> but the slop space could surely accommodate several hundred more
> locks than the initial estimate allows.
>

OK, those 2 bits of fuzz look like they could increase the lock table size
by at least 1000 entries, maybe even 2000, which explains how I could get
~7500 locks without an error.  100K / 168 = 609 extra slots.  Assuming
MaxBackends is 10 higher than max_connections (wild guess there, but maybe
it's even higher?), then that's another 640 slots.


> Even if you know how many lockable objects your installation has (which I
> bet you don't, or at least the number isn't likely to hold still for long)


Not exactly, but we have time based (monthly) partitioning and can make a
reasonable estimate on how many years an installation will run so our guess
wouldn't be too far off.  My plan was to make an estimate then increase by
50% or maybe even 100% to account for new objects that could be added later.


> it's pretty hard to say exactly how many PROCLOCK entries you might need.


Yeah, good point there, but I suppose our connection pool's relatively low
max connection limit could be used as an upper bound.


> And bloating the lock table size by max_connections/2 or so to try
> to brute-force that doesn't seem like a good plan.
>

I'm not following - where does max_connections/2 come from?


>
> I'd just raise max_prepared_transactions until you stop seeing
> problems, and then maybe add a factor of two safety margin.
>

That's what we've already done for the short term solution.  It is somewhat
in conflict with your statement regarding the number of lockable objects
not holding still for long, though.  As time goes on and our scheduled jobs
automatically create new monthly partitions, or as our schema evolves, we
may eventually hit the limits again.  That's why we'd like to create some
formula that can estimate the max_locks_per_transaction value we should
configure (with the previously mentioned multiplier for safety / future
proofing).  An alternative would be to precreate all partitions we
anticipate needing so we don't get surprises down the line, but then we
incur extra planning cost for tables that will stay empty for months or
even years.

Craig


Re: pgAdmin 8.1 install

2024-01-10 Thread Craig McIlwee
>  Is there a problem with the installation of 8.1?  I’ve installed it 3
times but when I bring it up it still says I’m at 7.8.  The Help > About
menu shows 7.8 as well.

> It’s odd because I have gone from 7.6 to 7.7 and 7.7 to 7.8 following the
same steps.

When I look at the directory on my machine that pgAdmin is installed to, I
see the following:

C:\Users\ \AppData\Local\Programs\pgAdmin 4>dir
 Volume in drive C has no label.
 Volume Serial Number is 42DF-4B8A

 Directory of C:\Users\\AppData\Local\Programs\pgAdmin 4

11/28/2023  09:44 AM  .
11/28/2023  09:44 AM  ..
11/20/2023  12:26 PM91,965 DEPENDENCIES
11/28/2023  09:43 AM  docs
11/28/2023  09:44 AM  installer
11/20/2023  12:26 PM 1,196 LICENSE
11/20/2023  12:26 PM35,147 pgAdmin4.ico
11/28/2023  09:44 AM  python
11/28/2023  12:50 PM  runtime
11/20/2023  12:46 PM 3,417,277 sbom.json
11/28/2023  09:44 AM 4,766,927 unins000.dat
11/28/2023  09:43 AM 3,166,605 unins000.exe
02/01/2023  10:59 AM  v6
08/16/2023  01:17 PM  v7
11/28/2023  09:49 AM  web

Note that v6 and v7 are in subdirectories but v8 is not.  If you have an
existing shortcut that you are using, then perhaps it is pointed to the v7
directory and needs to be updated to point to the directory that v8 is
installed in.  That would also explain why you had no problems with minor
updates of v7 but are seeing this after moving to v8.

Craig

>


Re: Finding error in long input file

2024-07-09 Thread Craig McIlwee
Full error message from earlier in the thread:

> psql:scripts/insert-addrs.sql:488: ERROR:  syntax error at or near ";"
> LINE 488: ...2832,1,default,'85250 Red House Rd','Paisley','OR','97636');
>  ^


The error:
>
> LINE 488: ...2832,1,default,'85250 Red House Rd','Paisley','OR','97636')
>
> is giving you the line number and the data:
> a) Navigate to that line number using whatever method Joe has for that.
> b) Search for '85250 Red House Rd'.
>


The input file is 488 lines (presumably, since Rich said the file should
insert 488 rows).  It seems like too much of a coincidence that the last
character of the last line is really the error.  My guess is that there is
an unmatched character, perhaps a parenthesis, that is throwing off the
parser because it doesn't expect the statement to terminate yet.  Maybe
that unmatched char really is on the last line, but '85250 Red House Rd'
doesn't seem like the issue.  I don't know anything about the joe editor,
but I'd hope that any decent editor with syntax highlighting would make it
apparent where things went awry.

Craig


Re: a very naive question about table names in Postgres

2021-12-31 Thread Craig McIlwee
I know that on my Mac the tables are  kept in the  data directory
> /Users/martinmueller/Library/Application Support/Postgres/var-13.  If I go
> there I see that internally the tables have numbers.  Somewhere in the
> system there must be a table that maps the name I gave to table X  to the
> numerical  inside the database.
>

Check the pg_class table in the system catalog.  The numeric value is the
OID.

https://www.postgresql.org/docs/14/catalog-pg-class.html

Craig

>


Re: a very naive question about table names in Postgres

2021-12-31 Thread Craig McIlwee
>
> I know that on my Mac the tables are  kept in the  data directory
>> /Users/martinmueller/Library/Application Support/Postgres/var-13.  If I go
>> there I see that internally the tables have numbers.  Somewhere in the
>> system there must be a table that maps the name I gave to table X  to the
>> numerical  inside the database.
>>
>
> Check the pg_class table in the system catalog.  The numeric value is the
> OID.
>
> https://www.postgresql.org/docs/14/catalog-pg-class.html
>
> Craig
>

Whoops, spoke too soon.  I believe it's actually the relfilenode value that
you want.

Craig

>


Re: Trouble using pg_rewind to undo standby promotion

2024-11-07 Thread Craig McIlwee
>
>
>> Are you archiving WAL on the promoted machine in a way that your
>> restore_command can find it? Check archive_command and archive_mode on the
>> promoted machine.
>>
>
> No, the promoted machine is not archiving.  How should that work?  Is it
> OK for a log shipping standby that uses restore_command to also push to the
> same directory with an archive_command or would that cause issues of trying
> to read and write the same file simultaneously during WAL replay?  Or
> should I be setting up an archive_command that pushes to a separate
> directory and have a restore_command that knows to check both locations?
>
> Hmm, as I write that out, I realize that I could use archive_mode = on
> instead of archive_mode = always to avoid the potential for read/write
> conflicts during WAL replay.  I can try this later and report back.
>

Setting archive_mode = on and a restore_command that reads from the WAL
archive did the trick.  With those changes in place, I was able to
successfully run pg_rewind and get the promoted standby back onto timeline
1.  Thanks for the tips.

Craig


Re: Trouble using pg_rewind to undo standby promotion

2024-11-07 Thread Craig McIlwee
On Thu, Nov 7, 2024 at 4:47 AM Torsten Förtsch 
wrote:

> Your point of divergence is in the middle of the 7718/00BF file. So,
> you should have 2 such files eventually, one on timeline 1 and the other on
> timeline 2.
>
> Are you archiving WAL on the promoted machine in a way that your
> restore_command can find it? Check archive_command and archive_mode on the
> promoted machine.
>

No, the promoted machine is not archiving.  How should that work?  Is it OK
for a log shipping standby that uses restore_command to also push to the
same directory with an archive_command or would that cause issues of trying
to read and write the same file simultaneously during WAL replay?  Or
should I be setting up an archive_command that pushes to a separate
directory and have a restore_command that knows to check both locations?

Hmm, as I write that out, I realize that I could use archive_mode = on
instead of archive_mode = always to avoid the potential for read/write
conflicts during WAL replay.  I can try this later and report back.

Also, do your archive/restore scripts work properly for history files?
>

The scripts don't do anything special with history files.  They are based
on the continuous archive docs [1] and this [2] article the with slight
modification to include a throttled scp since the log shipping server is
located in a different data center from the promoted standby and there is
limited bandwidth between the two.  (Also note that the archive script from
[2] is adapted to properly handle file transfer failures - the one in the
article will use the exit code of the rm command so postgres won't be
informed the file transfer fails resulting in missing WAL in the archive.)

Archive script:
---
#!/bin/bash

# $1 = %p
# $2 = %f

limit=10240 # 10Mbps

gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz

scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x
:/data/wal_archive/operational/$2.gz
exit_code=$?

rm /tmp/archive/$2.gz

exit $exit_code
---

Restore script:
---
gunzip < /data/wal_archive/operational/$2.gz > $1
---

[1]
https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS
[2]
https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving

Craig

>


Trouble using pg_rewind to undo standby promotion

2024-11-06 Thread Craig McIlwee
I have a primary -> standby 1 -> standby 2 setup with all servers running
PG 13.8 (this effort is part of getting on to a newer version, but I think
those details aren't relevant to this problem).  The first standby uses
streaming replication from the primary and the second standby is using a
WAL archive with a restore_command.  To make this standby chain work,
standby 1 is set to archive_mode = always with a command that populates the
WAL archive.

I would like to be able to promote standby 2 (hereon referred to just as
'standby'), perform some writes, then rewind it back to the point before
promotion so it can become a standby again.  The documentation for
pg_rewind says that this is supported and it seems like it should be
straightforward, but I'm not having any luck getting this to work so I'm
hoping someone can point out what I'm doing wrong.  Here's what I did:

First, observe that WAL is properly being applied from the archive.  Note
that we are currently on timeline 1.

2024-11-06 09:51:23.286 EST [5438] LOG:  restored log file
"0001771100F9" from archive
2024-11-06 09:51:23.434 EST [5438] LOG:  restored log file
"0001771100FA" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0001771100FB.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0002.history.gz: No such file or directory

Next, stop postgres, set wal_log_hints = on as required by pg_rewind, and
restart postgres.  I also make a copy of the data directory while the
postgres is not running so I can repeat my test, which works fine on a
small test database but won't be possible for the multi TB database that I
will eventually be doing this on.

Now promote the standby using "select pg_promote()" and see that it
switches to a new timeline.  You can also see that the last WAL applied
from the archive is 7718/BF.

2024-11-06 12:10:10.831 EST [4336] LOG:  restored log file
"0001771800BD" from archive
2024-11-06 12:10:10.996 EST [4336] LOG:  restored log file
"0001771800BE" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0001771800BF.gz: No such file or
directory
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0002.history.gz: No such file or directory
2024-11-06 12:10:15.384 EST [4336] LOG:  restored log file
"0001771800BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0001771800C0.gz: No such file or
directory
2024-11-06 12:10:15.457 EST [4336] LOG:  received promote request
2024-11-06 12:10:15.457 EST [4336] LOG:  redo done at 7718/BF30
2024-11-06 12:10:15.457 EST [4336] LOG:  last completed transaction was at
log time 2024-11-06 12:10:22.627074-05
2024-11-06 12:10:15.593 EST [4336] LOG:  restored log file
"0001771800BF" from archive
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0002.history.gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG:  selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG:  archive recovery complete
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0001.history.gz: No such file or directory
2024-11-06 12:10:17.028 EST [4329] LOG:  database system is ready to accept
connections

Next, insert a record into just to make some changes that I want to
rollback later.  (What I will eventually be doing is creating a publication
to ship data to a newer version, but again, that's not what's important
here.)

Finally, shutdown postgres and attempt a rewind.  The address used in the
--source-server connection string is the address of the primary.

2024-11-06 12:11:11.139 EST [4329] LOG:  received fast shutdown request
2024-11-06 12:11:11.143 EST [4329] LOG:  aborting any active transactions
2024-11-06 12:11:11.144 EST [4329] LOG:  background worker "logical
replication launcher" (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4342] LOG:  shutting down
2024-11-06 12:11:41.753 EST [4329] LOG:  database system is shut down

/usr/pgsql-13/bin/pg_rewind --target-pgdata=/data/pgsql/operational
--source-server="host=x.x.x.x dbname=postgres user=xxx password=xxx"
--dry-run --progress --restore-target-wal

pg_rewind: connected to server
pg_rewind: servers diverged at WAL location 7718/BFE8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2:
/data/wal_archive/operational/0002771800BF.gz: No such file or
directory
pg_rewind: error: could not restore file "0002771800BF" from
archive
pg_rewind: fatal: could not find previous WAL record at 7718/BFE8

pg_rewind shows the point of divergence as 7718/BF on timeline 1, but when
it tries to replay WAL using the restore command it is trying to find WAL
from timeline 2 rather than picking back up on tim