Logical Replication Issue

2020-06-30 Thread Donzell White
Hi,
I have set up logical replication from an AWS RDS instance to a postgreSQL 
database on an EC2 instance. I am getting an “out of memory” error during the 
initial for one of the tables. I have include a snippet of the error at the 
bottom of this email. The snippet below repeats every four or eight hours, 
depending on the work_mem and maintenance_work_mem values. The other 500 plus 
tables load with no problem. 

Does anyone have experience with this issue. Suggestions on what to try would 
be greatly appreciated.

Both databases are running PostgreSQL 11.5.
The database is 5 TB in size and has over 500 tables.  The "out of memory" 
error every eight hours.  The table is 578GB in the RDS database and ate up 
over 9TB before it ran out of space on the EC2 instance.  The table has a bytea 
column but there is no data in that column. It also has a text column and the 
largest text length is 978MB.  

The odd thing is that this is not the largest table in the database. The 
largest table took 57 hours to initial load and it does have bytea data.  The 
largest table is about 4 times larger from a size perspective.  I have 
increases the work_mem and the maintenance_work_mem parameters and tried 
loading only the problem table.  This resulted in the “out of memory” error 
occurring every 4 hours instead of every eight hours. Note that the source 
database is static (no changes).

020-06-17 07:52:28.618 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:52:52.505 UTC [8410] LOG:  0: checkpoint complete: wrote 
69088 buffers (1.6%); 0 WAL file(s) added, 0 removed, 42 recycled; write=23.548 
s, sync=0.271 s, total=23.886 s; sync files=14, longest=0.165 s, average=0.019 
s; distance=688856 kB, estimate=701162 kB
2020-06-17 07:52:52.505 UTC [8410] LOCATION:  LogCheckpointEnd, xlog.c:8590
2020-06-17 07:53:15.960 UTC [8410] LOG:  0: checkpoint starting: xlog
2020-06-17 07:53:15.960 UTC [8410] LOCATION:  LogCheckpointStart, xlog.c:8508
2020-06-17 07:53:23.933 UTC [14390] ERROR:  XX000: could not receive data from 
WAL stream: ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741802 bytes by 28 
more bytes.
2020-06-17 07:53:23.933 UTC [14390] CONTEXT:  COPY product, line 15568244
2020-06-17 07:53:23.933 UTC [14390] LOCATION:  libpqrcv_receive, 
libpqwalreceiver.c:772
2020-06-17 07:53:24.160 UTC [19873] LOG:  0: logical replication table 
synchronization worker for subscription "subscription_test_tables", table 
“product" has started
2020-06-17 07:53:24.160 UTC [19873] LOCATION:  ApplyWorkerMain, worker.c:1662



Re: Logical Replication Issue

2020-06-30 Thread Donzell White
My table has a primary key.  In addition, this is an initial load issue where 
the replication identity should not matter.


-Original Message-
From: Michael Lewis 
To: Donzell White 
Cc: pgsql-general 
Sent: Tue, Jun 30, 2020 1:05 pm
Subject: Re: Logical Replication Issue

Per the release notes, there are some enhancements to logical replication that 
came after 11.5 like 11.8 particularly related to replication identity full. Do 
you have a primary key or unique index that is being used for the replication 
identity?