Hi,
I have set up logical replication from an AWS RDS instance to a postgreSQL
database on an EC2 instance. I am getting an “out of memory” error during the
initial for one of the tables. I have include a snippet of the error at the
bottom of this email. The snippet below repeats every four or eight hours,
depending on the work_mem and maintenance_work_mem values. The other 500 plus
tables load with no problem.
Does anyone have experience with this issue. Suggestions on what to try would
be greatly appreciated.
Both databases are running PostgreSQL 11.5.
The database is 5 TB in size and has over 500 tables. The "out of memory"
error every eight hours. The table is 578GB in the RDS database and ate up
over 9TB before it ran out of space on the EC2 instance. The table has a bytea
column but there is no data in that column. It also has a text column and the
largest text length is 978MB.
The odd thing is that this is not the largest table in the database. The
largest table took 57 hours to initial load and it does have bytea data. The
largest table is about 4 times larger from a size perspective. I have
increases the work_mem and the maintenance_work_mem parameters and tried
loading only the problem table. This resulted in the “out of memory” error
occurring every 4 hours instead of every eight hours. Note that the source
database is static (no changes).
020-06-17 07:52:28.618 UTC [8410] LOCATION: LogCheckpointStart, xlog.c:8508
2020-06-17 07:52:52.505 UTC [8410] LOG: 0: checkpoint complete: wrote
69088 buffers (1.6%); 0 WAL file(s) added, 0 removed, 42 recycled; write=23.548
s, sync=0.271 s, total=23.886 s; sync files=14, longest=0.165 s, average=0.019
s; distance=688856 kB, estimate=701162 kB
2020-06-17 07:52:52.505 UTC [8410] LOCATION: LogCheckpointEnd, xlog.c:8590
2020-06-17 07:53:15.960 UTC [8410] LOG: 0: checkpoint starting: xlog
2020-06-17 07:53:15.960 UTC [8410] LOCATION: LogCheckpointStart, xlog.c:8508
2020-06-17 07:53:23.933 UTC [14390] ERROR: XX000: could not receive data from
WAL stream: ERROR: out of memory
DETAIL: Cannot enlarge string buffer containing 1073741802 bytes by 28
more bytes.
2020-06-17 07:53:23.933 UTC [14390] CONTEXT: COPY product, line 15568244
2020-06-17 07:53:23.933 UTC [14390] LOCATION: libpqrcv_receive,
libpqwalreceiver.c:772
2020-06-17 07:53:24.160 UTC [19873] LOG: 0: logical replication table
synchronization worker for subscription "subscription_test_tables", table
“product" has started
2020-06-17 07:53:24.160 UTC [19873] LOCATION: ApplyWorkerMain, worker.c:1662