Re: Error on vacuum: xmin before relfrozenxid
Hi! > 24 мая 2018 г., в 0:55, Paolo Crosato написал(а): > > 1) VACUUM FULL was issued after the first time the error occurred, and a > couple of times later. CLUSTER was never run. > 2) Several failovers tests were perfomed before the cluster was moved to > production. However, before the move, the whole cluster was wiped, including > all the application and monitoring users. After the db was moved to > production, a couple of users were added without any problem. > 3) No, even if the replication level is set to logical in postgresql.conf, we > only use streaming replication. I've encountered seemingly similar ERROR: [ 2018-05-22 15:04:03.270 MSK ,,,281756,XX001 ]:ERROR: found xmin 747375134 from before relfrozenxid 2467346321 [ 2018-05-22 15:04:03.270 MSK ,,,281756,XX001 ]:CONTEXT: automatic vacuum of table "postgres.pg_catalog.pg_database" Table pg_database, probably, was not changed anyhow for a long period of database exploitation. Unfortunately, I've found out this only there were million of xids left and had to vacuum freeze db in single user mode asap. But, probably, I will be able to restore database from backups and inspect it, if necessary. Though first occurrence of this error was beyond recovery window. Best regards, Andrey Borodin.
Commit to primary with unavailable sync standby
Hi! I cannot figure out proper way to implement safe HA upsert. I will be very grateful if someone would help me. Imagine we have primary server after failover. It is network-partitioned. We are doing INSERT ON CONFLICT DO NOTHING; that eventually timed out. az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS ( INSERT INTO t( pk, v, dt ) VALUES ( 5, 'text', now() ) ON CONFLICT (pk) DO NOTHING RETURNING pk, v, dt) SELECT new_doc.pk from new_doc; ^CCancel request sent WARNING: 01000: canceling wait for synchronous replication due to user request DETAIL: The transaction has already committed locally, but might not have been replicated to the standby. LOCATION: SyncRepWaitForLSN, syncrep.c:264 Time: 2173.770 ms (00:02.174) Here our driver decided that something goes wrong and we retry query. az1-grx88oegoy6mrv2i/db1 M > WITH new_doc AS ( INSERT INTO t( pk, v, dt ) VALUES ( 5, 'text', now() ) ON CONFLICT (pk) DO NOTHING RETURNING pk, v, dt) SELECT new_doc.pk from new_doc; pk (0 rows) Time: 4.785 ms Now we have split-brain, because we acknowledged that row to client. How can I fix this? There must be some obvious trick, but I cannot see it... Or maybe cancel of sync replication should be disallowed and termination should be treated as system failure? Best regards, Andrey Borodin.
Re: Commit to primary with unavailable sync standby
Hi Fabio! Thanks for looking into this. > 19 дек. 2019 г., в 17:14, Fabio Ugo Venchiarutti > написал(а): > > > You're hitting the CAP theorem ( https://en.wikipedia.org/wiki/CAP_theorem ) > > > You cannot do it with fewer than 3 nodes, as the moment you set your standby > to synchronous to achieve consistency, both your nodes become single points > of failure. We have 3 nodes, and the problem is reproducible with all standbys being synchronous. > With 3 or more nodes you can perform what is called a quorum write against ( > floor( / 2) + 1 ) nodes . The problem seems to be reproducible in quorum commit too. > With 3+ nodes, the "easy" strategy is to set a number of standby > nodes in synchronous_standby_names ( > https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-SYNCHRONOUS-STANDBY-NAMES > ) > > > This however makes it tricky to pick the correct standby for promotions > during auto-failovers, as you need to freeze all the standbys listed in the > above setting in order to correctly determine which one has the highest WAL > location without running into race conditions (as the operation is > non-atomic, stateful and sticky). After promotion of any standby we still can commit to old primary with the combination of cancel and retry. > I personally prefer to designate a fixed synchronous set at setup time and > automatically set a static synchronous_standby_names on the master whenever a > failover occurs. That allows for a simpler failover mechanism as you know > they got the latest WAL location. No, synchronous standby does not necessarily own latest WAL. It has WAL point no earlier than all commits acknowledged to client. Thanks! Best regards, Andrey Borodin.