Critical errors during logical decoding
List, For seemingly no reason at all, my logical replication slot has just started bailing out with errors every time my client tries to connect to it. Particularly - at this moment - I'm getting: 2018-02-07 19:14:31 UTC [3985-1] svc_app@app LOG: 0: received replication command: START_REPLICATION SLOT event_stream LOGICAL 0/0 ("startup_params_format" '1', "no_txinfo" 'true', "expected_encoding" 'UTF8', "max_proto_version" '1', "proto_format" 'json', "min_proto_version" '1') 2018-02-07 19:14:31 UTC [3985-2] svc_app@app LOCATION: exec_replication_command, walsender.c:1341 2018-02-07 19:14:31 UTC [3985-3] svc_app@app LOG: 0: starting logical decoding for slot "event_stream" 2018-02-07 19:14:31 UTC [3985-4] svc_app@app DETAIL: streaming transactions committing after 0/8DDFD48, reading WAL from 0/8DDDC00 2018-02-07 19:14:31 UTC [3985-5] svc_app@app LOCATION: CreateDecodingContext, logical.c:408 2018-02-07 19:14:31 UTC [3985-6] svc_app@app LOG: 0: logical decoding found consistent point at 0/8DDDC00 2018-02-07 19:14:31 UTC [3985-7] svc_app@app DETAIL: There are no running transactions. 2018-02-07 19:14:31 UTC [3985-8] svc_app@app LOCATION: SnapBuildFindSnapshot, snapbuild.c:1245 2018-02-07 19:14:31 UTC [3985-9] svc_app@app ERROR: XX000: no known snapshots Other errors have included missing toast entries, as well as missing base/ files on disk. I blew away the entire data directory, and restored the database from a pg_dump to a fresh directory after checking filesystem consistency. The logical replication stream operates for a few minutes, and then eventually bails out again. This is Postgres 9.6.6 on Ubuntu installed from packages. Does anyone have any insight into what could be happening here? Other steps to try and rectify the problem? The database itself does not appear to have any issues that I can see. Best, Colin
Re: Critical errors during logical decoding
Appears to be a problem with pglogical_output when de-toasting columns. Switched to wal2json and the problem went away. I believe I may have found the issue anyway. Apologies for the noise. Best, Colin On Wed, Feb 7, 2018 at 2:20 PM, Colin Morelli wrote: > List, > > For seemingly no reason at all, my logical replication slot has just > started bailing out with errors every time my client tries to connect to > it. Particularly - at this moment - I'm getting: > > 2018-02-07 19:14:31 UTC [3985-1] svc_app@app LOG: 0: received > replication command: START_REPLICATION SLOT event_stream LOGICAL 0/0 > ("startup_params_format" '1', "no_txinfo" 'true', "expected_encoding" > 'UTF8', "max_proto_version" '1', "proto_format" 'json', "min_proto_version" > '1') > 2018-02-07 19:14:31 UTC [3985-2] svc_app@app LOCATION: > exec_replication_command, walsender.c:1341 > 2018-02-07 19:14:31 UTC [3985-3] svc_app@app LOG: 0: starting > logical decoding for slot "event_stream" > 2018-02-07 19:14:31 UTC [3985-4] svc_app@app DETAIL: streaming > transactions committing after 0/8DDFD48, reading WAL from 0/8DDDC00 > 2018-02-07 19:14:31 UTC [3985-5] svc_app@app LOCATION: > CreateDecodingContext, logical.c:408 > 2018-02-07 19:14:31 UTC [3985-6] svc_app@app LOG: 0: logical > decoding found consistent point at 0/8DDDC00 > 2018-02-07 19:14:31 UTC [3985-7] svc_app@app DETAIL: There are no > running transactions. > 2018-02-07 19:14:31 UTC [3985-8] svc_app@app LOCATION: > SnapBuildFindSnapshot, snapbuild.c:1245 > 2018-02-07 19:14:31 UTC [3985-9] svc_app@app ERROR: XX000: no known > snapshots > > Other errors have included missing toast entries, as well as missing base/ > files on disk. I blew away the entire data directory, and restored the > database from a pg_dump to a fresh directory after checking filesystem > consistency. The logical replication stream operates for a few minutes, and > then eventually bails out again. This is Postgres 9.6.6 on Ubuntu installed > from packages. > > Does anyone have any insight into what could be happening here? Other > steps to try and rectify the problem? The database itself does not appear > to have any issues that I can see. > > Best, > Colin >
Re: Critical errors during logical decoding
Hey Tomas, We're using v1.1.0 of pglogical. I can't honestly say what could reproduce this issue as I'm not entirely sure and haven't tested it yet. I had thought it was caused when a toasted row is deleted before the pglogical_output plugin processes the event (so it fails when going to de-toast the row), but I would imagine that postgres internally should prevent deleting data that would be needed by a replication stream. I must admit I'm not well versed in this area, though. My understanding is wal2json doesn't detoast the data at all, and thus doesn't have an issue (for our case, we really just need the json data with an identifier telling us that the row changed, we're not doing full logical replication). Best, Colin On Wed, Feb 7, 2018 at 4:18 PM, Tomas Vondra wrote: > On 02/07/2018 09:07 PM, Colin Morelli wrote: > > Appears to be a problem with pglogical_output when de-toasting columns. > > Switched to wal2json and the problem went away. I believe I may have > > found the issue anyway. Apologies for the noise. > > > > What pglogical version are you using? Can you share an example of data > causing this issue (or even better a script reproducing it)? > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >