Pg_rewind cannot load history wal

2018-08-01 Thread Richard Schmidt
We have been struggling to get pr_rewind to work.  In desperation we have been 
trying to make the use case as simple as possible:)

We have two databases servers running Postgres 10 on two different machine in 
the normal Primary/Standby configuration.
Both machines write their WAL archive logs to the same shared drive (called 
/ice_dev/wal_archive).
The configuration  has the following terms
   archive_mode = always
  archive_command = 'test ! -f /ice-dev/wal_archive/%f && cp %p 
/ice-dev/wal_archive/%f'
  full_page_writes = on
  wal_log_hints = on
Checksums are enabled

Our procedure that runs on machine A and B is as follows:


  1.  Build new databases on A and B, and configure A as Primary and B as 
Standby databases.
  2.  Make some changes to the A (the primary) and check that they are 
replicated to the B (the standby)
  3.  Promote B to be the new primary
  4.  Switch of the A (the original primary)
  5.  Add the replication slot to B (the new primary) for A (soon to be standby)
  6.  Add a recovery.conf to A (soon to be standby). File contains 
recovery_target_timeline = 'latest' and restore_command = 'cp 
/ice-dev/wal_archive/%f "%p"
  7.  Run pg_rewind on A - this appears to work as it returns the message 
'source and target cluster are on the same timeline no rewind required';
  8.  Start up server A (now a slave)

At this point A is in a read only mode but not replicating. Its logs contain 
the following repeating message

2018-08-01 20:30:58 UTC [7257]: [1] user=,db=,app=,client= FATAL:  could not 
start WAL streaming: ERROR:  requested starting point 0/600 on timeline 1 
is not in this server's history
DETAIL:  This server's history forked from timeline 1 at 0/57639D0.
cp: cannot stat '/ice-dev/wal_archive/0002.history': No such file or 
directory
cp: cannot stat '/ice-dev/wal_archive/0003.history': No such file or 
directory
cp: cannot stat '/ice-dev/wal_archive/0002.history': No such file or 
directory
2018-08-01 20:30:58 UTC [6840]: [48] user=,db=,app=,client= LOG:  new timeline 
2 forked off current database system timeline 1 before current recovery point 
0/698
cp: cannot stat '/ice-dev/wal_archive/00010006': No such file 
or directory

We can see the 0002.history file in B's wal directory.but it never 
appears in the wal_archive directory - not even if we issue a checkout or even 
restart the server.
0003.history does not appear to exist on either of the machines.

Any ideas what we are doing wrong?
Thanks. Richard








This email and any attachments may contain confidential information. If you are 
not the intended recipient, your use or communication of the information is 
strictly prohibited. If you have received this message in error please notify 
MetService immediately.


FW: Pg_rewind cannot load history wal

2018-08-02 Thread Richard Schmidt
> Now once your master A can’t become slave of B.

Isn’t that the exact situation that pg_rewind should take care of?


This email and any attachments may contain confidential information. If you are 
not the intended recipient, your use or communication of the information is 
strictly prohibited. If you have received this message in error please notify 
MetService immediately.


FW: Pg_rewind cannot load history wal

2018-08-08 Thread Richard Schmidt
We think we have found our missing step. We needed to do an ordered shutdown of 
the original primary before promoting the standby
I.e.

>1. Make some changes to the A (the primary) and check that they are replicated 
>to the B (the standby)

Missing step:
 Perform ordered shutdown of A (the primary)

>2.Promote B to be the new primary

This know means that we have this simple use-case working.




This email and any attachments may contain confidential information. If you are 
not the intended recipient, your use or communication of the information is 
strictly prohibited. If you have received this message in error please notify 
MetService immediately.