OOM killer while pg_restore
Hello, We have a pg_restore which fails due to RAM over-consumption of the corresponding PG backend, which ends-up with OOM killer. The table has one PK, one index, and 3 FK constraints, active while restoring. The dump contains over 200M rows for that table and is in custom format, which corresponds to 37 GB of total relation size in the original DB. While importing, one can see the RSS + swap increasing linearly for the backend (executing the COPY) On my machine (quite old PC), it failed after 16 hours, while the disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap) If we do the same test, suppressing firstly the 5 constraints on the table, the restore takes less than 15 minutes ! This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines). It there a memory leak or that is normal that a bacend process may exhaust the RAM to such an extent ? Thanks Regards
Re: OOM killer while pg_restore
Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté escreveu: > Hello, > > We have a pg_restore which fails due to RAM over-consumption of the > corresponding PG backend, which ends-up with OOM killer. > > The table has one PK, one index, and 3 FK constraints, active while > restoring. > The dump contains over 200M rows for that table and is in custom format, > which corresponds to 37 GB of total relation size in the original DB. > > While importing, one can see the RSS + swap increasing linearly for the > backend (executing the COPY) > > On my machine (quite old PC), it failed after 16 hours, while the disk > usage was reaching 26 GB and memory usage was 9.1g (RSS+swap) > > If we do the same test, suppressing firstly the 5 constraints on the > table, the restore takes less than 15 minutes ! > > This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines). > > It there a memory leak or that is normal that a bacend process may exhaust > the RAM to such an extent ? > Hi Marc, Can you post the server logs? regards, Ranier Vilela
Re: OOM killer while pg_restore
Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté escreveu: Hello, We have a pg_restore which fails due to RAM over-consumption of the corresponding PG backend, which ends-up with OOM killer. The table has one PK, one index, and 3 FK constraints, active while restoring. The dump contains over 200M rows for that table and is in custom format, which corresponds to 37 GB of total relation size in the original DB. While importing, one can see the RSS + swap increasing linearly for the backend (executing the COPY) On my machine (quite old PC), it failed after 16 hours, while the disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap) If we do the same test, suppressing firstly the 5 constraints on the table, the restore takes less than 15 minutes ! This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines). It there a memory leak or that is normal that a bacend process may exhaust the RAM to such an extent ? Hi Marc, Can you post the server logs? regards, Ranier Vilela Will it help ? 2022-02-25 12:01:29.306 GMT [1468:24] user=,db=,app=,client= LOG: server process (PID 358995) was terminated by signal 9: Killed 2022-02-25 12:01:29.306 GMT [1468:25] user=,db=,app=,client= DETAIL: Failed process was running: COPY simulations_ecarts_relatifs_saison (idpoint, annee, saison, idreferentiel, ecartreltav, ecartreltnav, ecartreltxav, ecartreltrav, ecartreltxq90, ecartreltxq10, ecartreltnq10, ecartreltnq90, ecartreltxnd, ecartreltnnd, ecartreltnht, ecartreltxhwd, ecartreltncwd, ecartreltnfd, ecartreltxfd, ecartrelsd, ecartreltr, ecartrelhdd, ecartrelcdd, ecartrelpav, ecartrelpint, ecartrelrr, ecartrelpfl90, ecartrelrr1mm, ecartrelpxcwd, ecartrelpn20mm, ecartrelpxcdd, ecartrelhusav, ecartreltx35, ecartrelpq90, ecartrelpq99, ecartrelrr99, ecartrelffav, ecartrelff3, ecartrelffq98, ecartrelff98) FROM stdin; 2022-02-25 12:01:29.306 GMT [1468:26] user=,db=,app=,client= LOG: terminating any other active server processes 2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all server processes terminated; reinitializing 2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all server processes terminated; reinitializing 2022-02-25 12:01:29.326 GMT [360309:1] user=,db=,app=,client= LOG: database system was interrupted; last known up at 2022-02-25 12:01:12 GMT 2022-02-25 12:01:29.362 GMT [360310:1] user=[unknown],db=[unknown],app=[unknown],client=[local] LOG: connection received: host=[local] 2022-02-25 12:01:29.363 GMT [360310:2] user=postgres,db=drias,app=[unknown],client=[local] FATAL: the database system is in recovery mode 2022-02-25 12:01:29.365 GMT [360309:2] user=,db=,app=,client= LOG: database system was not properly shut down; automatic recovery in progress 2022-02-25 12:01:29.367 GMT [360309:3] user=,db=,app=,client= LOG: redo starts at C3/1E0D31F0 2022-02-25 12:01:40.845 GMT [360309:4] user=,db=,app=,client= LOG: redo done at C3/6174BC00 system usage: CPU: user: 4.15 s, system: 1.40 s, elapsed: 11.47 s 2022-02-25 12:01:40.847 GMT [360309:5] user=,db=,app=,client= LOG: checkpoint starting: end-of-recovery immediate 2022-02-25 12:01:41.806 GMT [360309:6] user=,db=,app=,client= LOG: checkpoint complete: wrote 125566 buffers (100.0%); 0 WAL file(s) added, 54 removed, 13 recycled; write=0.915 s, sync=0.001 s, total=0.960 s; sync files=10, longest=0.001 s, average=0.001 s; distance=1104355 kB, estimate=1104355 kB 2022-02-25 12:01:41.810 GMT [1468:28] user=,db=,app=,client= LOG: database system is ready to accept connections
Re: OOM killer while pg_restore
Em qui., 3 de mar. de 2022 às 09:19, Marc Rechté escreveu: > Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté > escreveu: > > > > Hello, > > > > We have a pg_restore which fails due to RAM over-consumption of > > the corresponding PG backend, which ends-up with OOM killer. > > > > The table has one PK, one index, and 3 FK constraints, active > > while restoring. > > The dump contains over 200M rows for that table and is in custom > > format, which corresponds to 37 GB of total relation size in the > > original DB. > > > > While importing, one can see the RSS + swap increasing linearly > > for the backend (executing the COPY) > > > > On my machine (quite old PC), it failed after 16 hours, while the > > disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap) > > > > If we do the same test, suppressing firstly the 5 constraints on > > the table, the restore takes less than 15 minutes ! > > > > This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines). > > > > It there a memory leak or that is normal that a bacend process may > > exhaust the RAM to such an extent ? > > > > Hi Marc, > > Can you post the server logs? > > > > regards, > > Ranier Vilela > > Will it help ? > Show some direction. > 2022-02-25 12:01:29.306 GMT [1468:24] user=,db=,app=,client= LOG: > server process (PID 358995) was terminated by signal 9: Killed > 2022-02-25 12:01:29.306 GMT [1468:25] user=,db=,app=,client= DETAIL: > Failed process was running: COPY simulations_ecarts_relatifs_saison > (idpoint, annee, saison, idreferentiel, ecartreltav, ecartreltnav, > ecartreltxav, ecartreltrav, ecartreltxq90, ecartreltxq10, ecartreltnq10, > ecartreltnq90, ecartreltxnd, ecartreltnnd, ecartreltnht, ecartreltxhwd, > ecartreltncwd, ecartreltnfd, ecartreltxfd, ecartrelsd, ecartreltr, > ecartrelhdd, ecartrelcdd, ecartrelpav, ecartrelpint, ecartrelrr, > ecartrelpfl90, ecartrelrr1mm, ecartrelpxcwd, ecartrelpn20mm, > ecartrelpxcdd, ecartrelhusav, ecartreltx35, ecartrelpq90, ecartrelpq99, > ecartrelrr99, ecartrelffav, ecartrelff3, ecartrelffq98, ecartrelff98) > FROM stdin; > COPY leak? regards, Ranier Vilela
Re: OOM killer while pg_restore
On Thu, Mar 03, 2022 at 09:59:03AM +0100, Marc Rechté wrote: > Hello, > > We have a pg_restore which fails due to RAM over-consumption of the > corresponding PG backend, which ends-up with OOM killer. > > The table has one PK, one index, and 3 FK constraints, active while restoring. Send the schema for the table, index, and constraints (\d in psql). What are the server settings ? https://wiki.postgresql.org/wiki/Server_Configuration What OS/version ? > The dump contains over 200M rows for that table and is in custom format, > which corresponds to 37 GB of total relation size in the original DB. > > While importing, one can see the RSS + swap increasing linearly for the > backend (executing the COPY) > > On my machine (quite old PC), it failed after 16 hours, while the disk usage > was reaching 26 GB and memory usage was 9.1g (RSS+swap)
RE: An I/O error occurred while sending to the backend (PG 13.4)
> -Original Message- > From: Justin Pryzby > Sent: Tuesday, March 1, 2022 14:27 > To: [email protected] > Cc: [email protected] > Subject: Re: An I/O error occurred while sending to the backend (PG 13.4) > > On Tue, Mar 01, 2022 at 04:28:31PM +, [email protected] > wrote: > > Now, there is an additional component I think... Storage is on an array > and I am not getting a clear answer as to where it is 😊 Is it possible that > something is happening at the storage layer? Could that be reported as a > network issue vs a storage issue for Postgres? > > No. If there were an error with storage, it'd be reported as a local error, > and the query would fail, rather than failing with client-server > communication. > > > Also, both machines are actually VMs. I forgot to mention that and not > sure if that's relevant. > > Are they running on the same hypervisor ? Is that hyperv ? > Lacking other good hypotheses, that does seem relevant. > > -- > Justin Issue happened again last night. I did implement your recommendations but it didn't seem to prevent the issue: tcp_keepalives_idle=9 # TCP_KEEPIDLE, in seconds; # 0 selects the system default tcp_keepalives_interval=9 # TCP_KEEPINTVL, in seconds; # 0 selects the system default tcp_keepalives_count=0 # TCP_KEEPCNT; # 0 selects the system default #tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; # 0 selects the system default On the client application, the exceptions are: 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : Unexpected error 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 2017-05-16 17.18.02 by buildguy) : org.pentaho.di.core.exception.KettleStepException: 2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop because of: 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) 2022/03/03 01:04:56 - Upsert2.0 -at java.lang.Thread.run(Thread.java:745) 2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.pentaho.di.core.exception.KettleDatabaseException: 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 - 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.core.database.Database.insertRow(Database.java:1321) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.core.database.Database.insertRow(Database.java:1245) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.core.database.Database.insertRow(Database.java:1233) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299) 2022/03/03 01:04:56 - Upsert2.0 -... 2 more 2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.postgresql.util.PSQLException: An I/O error occurred while sending to the backend. 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382) 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166) 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134) 2022/03/03 01:04:56 - Upsert2.0 -at org.pentaho.di.core.database.Database.insertRow(Database.java:1288) 2022/03/03 01:04:56 - Upsert2.0 -... 6 more 2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: Connection reset 2022/03/03 01:04:56 - Upsert2.0 -at java.net.SocketInputStream.read(SocketInputStream.java:209) 2022/03/03 01:04:56 - Upsert2.0 -at java.net.SocketInputStream.read(SocketInputStream.java:141) 2022/03/03 01:04:56 - Upsert2.0 -at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161) 2022/03/03 01:04:56 - Upsert2.0 -at
Re: OOM killer while pg_restore
=?UTF-8?Q?Marc_Recht=c3=a9?= writes: > We have a pg_restore which fails due to RAM over-consumption of the > corresponding PG backend, which ends-up with OOM killer. > The table has one PK, one index, and 3 FK constraints, active while > restoring. > The dump contains over 200M rows for that table and is in custom format, > which corresponds to 37 GB of total relation size in the original DB. The FKs would result in queueing row trigger events, which would occupy some memory. But those should only need ~12 bytes per FK per row, which works out to less than 10GB for this number of rows, so it may be that you've hit something else that we would consider a leak. Does memory consumption hold steady if you drop the FK constraints? If not, as others have noted, we'd need more info to investigate this. The leak is probably independent of the specific data in the table, so maybe you could make a small self-contained example using a script to generate dummy data. regards, tom lane
Re: An I/O error occurred while sending to the backend (PG 13.4)
Em qui., 3 de mar. de 2022 às 11:55, [email protected] < [email protected]> escreveu: > > >> -Original Message- >> From: Justin Pryzby >> Sent: Tuesday, March 1, 2022 14:27 >> To: [email protected] >> Cc: [email protected] >> Subject: Re: An I/O error occurred while sending to the backend (PG > 13.4) >> >> On Tue, Mar 01, 2022 at 04:28:31PM +, [email protected] >> wrote: >> > Now, there is an additional component I think... Storage is on an > array >> and I am not getting a clear answer as to where it is 😊 Is it > possible that >> something is happening at the storage layer? Could that be reported > as a >> network issue vs a storage issue for Postgres? >> >> No. If there were an error with storage, it'd be reported as a > local error, >> and the query would fail, rather than failing with client-server >> communication. >> >> > Also, both machines are actually VMs. I forgot to mention that and > not >> sure if that's relevant. >> >> Are they running on the same hypervisor ? Is that hyperv ? >> Lacking other good hypotheses, that does seem relevant. >> >> -- >> Justin > > Issue happened again last night. I did implement your recommendations but > it didn't seem to prevent the issue: > > tcp_keepalives_idle=9 # TCP_KEEPIDLE, in seconds; > # 0 selects the system default > tcp_keepalives_interval=9 # TCP_KEEPINTVL, in seconds; > # 0 selects the system default > tcp_keepalives_count=0 # TCP_KEEPCNT; > # 0 selects the system default > #tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; > # 0 selects the system default > > On the client application, the exceptions are: > > 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from > 2017-05-16 17.18.02 by buildguy) : Unexpected error > 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from > 2017-05-16 17.18.02 by buildguy) : > org.pentaho.di.core.exception.KettleStepException: > 2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop > because of: > 2022/03/03 01:04:56 - Upsert2.0 - > 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row > 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to > the backend. > 2022/03/03 01:04:56 - Upsert2.0 - > 2022/03/03 01:04:56 - Upsert2.0 - > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.trans.step.RunThread.run(RunThread.java:62) > 2022/03/03 01:04:56 - Upsert2.0 -at > java.lang.Thread.run(Thread.java:745) > 2022/03/03 01:04:56 - Upsert2.0 - Caused by: > org.pentaho.di.core.exception.KettleDatabaseException: > 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row > 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to > the backend. > 2022/03/03 01:04:56 - Upsert2.0 - > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.core.database.Database.insertRow(Database.java:1321) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.core.database.Database.insertRow(Database.java:1245) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.core.database.Database.insertRow(Database.java:1233) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299) > 2022/03/03 01:04:56 - Upsert2.0 -... 2 more > 2022/03/03 01:04:56 - Upsert2.0 - Caused by: > org.postgresql.util.PSQLException: An I/O error occurred while sending to > the backend. > 2022/03/03 01:04:56 - Upsert2.0 -at > org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134) > 2022/03/03 01:04:56 - Upsert2.0 -at > org.pentaho.di.core.database.Database.insertRow(Database.java:1288) > 2022/03/03 01:04:56 - Upsert2.0 -... 6 more > 2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: > Connection reset > 2022/03/03 01:04:56 - Upsert2.0 -at > java.net.SocketInputStream.read(SocketInputStream.java:209) > 2022/03/03
Re: An I/O error occurred while sending to the backend (PG 13.4)
On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote: > Sorry, but this is much more on the client side. The client is reporting the problem, as is the server. > Following the logs, it is understood that the client is dropping the > connection. The logs show that the client's connection *was* dropped. And on the server, the same. > So most likely the error could be from Pentaho or JDBC. > > https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/ > " This *SocketException* occurs on the server-side when the client closed > the socket connection before the response could be returned over the > socket." > > I suggest moving this thread to the Pentaho or JDBC support. We don't know the source of the problem. I still doubt it's in postgres, but I don't think it's helpful to blame the client, just because the client reported the problem. If the server were to disconnect abruptly, I'd expect the client to report that, too. Laurent would just have to start the conversation over (and probably collect the same diagnostic information anyway). The client projects could blame postgres with as much rationale as there is for us to blame the client. Please don't add confusion here. I made suggestions for how to collect more information to better understand the source of the problem, and there's probably not much else to say without that. -- Justin
Re: An I/O error occurred while sending to the backend (PG 13.4)
Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby escreveu: > On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote: > > Sorry, but this is much more on the client side. > > The client is reporting the problem, as is the server. > Are you read the server log? " 2022-03-03 01:04:40 EST [21228] LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. 2022-03-03 01:04:40 EST [21228] LOG: unexpected EOF on client connection with an open transaction" > > Following the logs, it is understood that the client is dropping the > > connection. > > The logs show that the client's connection *was* dropped. > And on the server, the same. > No, the log server shows that the client dropped the connection. > > > So most likely the error could be from Pentaho or JDBC. > > > > > https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/ > > " This *SocketException* occurs on the server-side when the client closed > > the socket connection before the response could be returned over the > > socket." > > > > I suggest moving this thread to the Pentaho or JDBC support. > > We don't know the source of the problem. Yeah, but it is much more likely to be on the client. > I still doubt it's in postgres, Everything indicates not. but I > don't think it's helpful to blame the client, just because the client > reported > the problem. If the server were to disconnect abruptly, I'd expect the > client > to report that, too. > > Laurent would just have to start the conversation over (and probably > collect > the same diagnostic information anyway). The client projects could blame > postgres with as much rationale as there is for us to blame the client. > > Please don't add confusion here. I just suggested, this is not an order. regards, Ranier Vilela
RE: An I/O error occurred while sending to the backend (PG 13.4)
I am also starting to feel that the issue being on the database’s side is less and less likely. There is something happening in between, or possibly on the client. Ranier, the only reason I was focusing on this at the PG level is that this issue started to show up several months ago shortly after I updated to PG13 from PG11. Had run PG11 for 2 years without ever seeing that issue at all. The ETL itself hasn’t changed either, except for upgrading the JDBC driver… But I did revert back to an older JDBC driver and the issue still did occur eventually. Of course, other things could have changed in the client’s IT infrastructure that I am not aware of, so I am pushing that angle as well more aggressively now. I am also pushing for WireShark to monitor the network more closely. Stay tuned! Thank you so much all for your support but at this time, I think the ball is in my camp and working out with it on some plan. Thank you, Laurent. From: Ranier Vilela Sent: Thursday, March 3, 2022 11:57 To: Justin Pryzby Cc: [email protected]; [email protected] Subject: Re: An I/O error occurred while sending to the backend (PG 13.4) Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby mailto:[email protected]>> escreveu: On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote: > Sorry, but this is much more on the client side. The client is reporting the problem, as is the server. Are you read the server log? " 2022-03-03 01:04:40 EST [21228] LOG: could not receive data from client: An existing connection was forcibly closed by the remote host. 2022-03-03 01:04:40 EST [21228] LOG: unexpected EOF on client connection with an open transaction" > Following the logs, it is understood that the client is dropping the > connection. The logs show that the client's connection *was* dropped. And on the server, the same. No, the log server shows that the client dropped the connection. > So most likely the error could be from Pentaho or JDBC. > > https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/ > " This *SocketException* occurs on the server-side when the client closed > the socket connection before the response could be returned over the > socket." > > I suggest moving this thread to the Pentaho or JDBC support. We don't know the source of the problem. Yeah, but it is much more likely to be on the client. I still doubt it's in postgres, Everything indicates not. but I don't think it's helpful to blame the client, just because the client reported the problem. If the server were to disconnect abruptly, I'd expect the client to report that, too. Laurent would just have to start the conversation over (and probably collect the same diagnostic information anyway). The client projects could blame postgres with as much rationale as there is for us to blame the client. Please don't add confusion here. I just suggested, this is not an order. regards, Ranier Vilela
Re: An I/O error occurred while sending to the backend (PG 13.4)
Em qui., 3 de mar. de 2022 às 15:19, [email protected] < [email protected]> escreveu: > I am also starting to feel that the issue being on the database’s side is > less and less likely. There is something happening in between, or possibly > on the client. > > > > Ranier, the only reason I was focusing on this at the PG level is that > this issue started to show up several months ago shortly after I updated to > PG13 from PG11. Had run PG11 for 2 years without ever seeing that issue at > all. The ETL itself hasn’t changed either, except for upgrading the JDBC > driver… But I did revert back to an older JDBC driver and the issue still > did occur eventually. > > > > Of course, other things could have changed in the client’s IT > infrastructure that I am not aware of, so I am pushing that angle as well > more aggressively now. I am also pushing for WireShark to monitor the > network more closely. Stay tuned! > > > > Thank you so much all for your support but at this time, I think the ball > is in my camp and working out with it on some plan. > You are welcome. regards, Ranier Vilela
Re: OOM killer while pg_restore
Le 03/03/2022 à 16:31, Tom Lane a écrit : =?UTF-8?Q?Marc_Recht=c3=a9?= writes: We have a pg_restore which fails due to RAM over-consumption of the corresponding PG backend, which ends-up with OOM killer. The table has one PK, one index, and 3 FK constraints, active while restoring. The dump contains over 200M rows for that table and is in custom format, which corresponds to 37 GB of total relation size in the original DB. The FKs would result in queueing row trigger events, which would occupy some memory. But those should only need ~12 bytes per FK per row, which works out to less than 10GB for this number of rows, so it may be that you've hit something else that we would consider a leak. Does memory consumption hold steady if you drop the FK constraints? If not, as others have noted, we'd need more info to investigate this. The leak is probably independent of the specific data in the table, so maybe you could make a small self-contained example using a script to generate dummy data. regards, tom lane Actually the number of rows is 232735712. Accordingly the RAM consumption would be x12 x3 = 7.8 GiB. This is close to the 8,1g I reported earlier (actually it was closer to 7.8 GB, due to GiB vs. GB confusion). So there is no memory leak. It took 16 hours on my box to reach that RAM consumption, and then the COPY failed when checking the first FK (as the referenced table was empty). I dropped the FK, index, and 3 FK constraints and started over the pg_restore: 11 minutes to load the table (I did not have time to note RAM consumption) I then created the PK and index: 24 minutes For FK, I don't know because the referenced table are empty (but I'll be able to test next week, if deemed necessary). 16 hours vs. 35 minutes to reach the same state. This is the data structure: = -- -- Name: simulations_ecarts_relatifs_saison; Type: TABLE; Schema: donnees2019; Owner: drias; Tablespace: -- CREATE TABLE simulations_ecarts_relatifs_saison ( idpoint integer NOT NULL, annee integer NOT NULL, saison integer NOT NULL, idreferentiel integer NOT NULL, ecartreltav real, ecartreltnav real, ecartreltxav real, ecartreltrav real, ecartreltxq90 real, ecartreltxq10 real, ecartreltnq10 real, ecartreltnq90 real, ecartreltxnd smallint, ecartreltnnd smallint, ecartreltnht smallint, ecartreltxhwd smallint, ecartreltncwd smallint, ecartreltnfd smallint, ecartreltxfd smallint, ecartrelsd smallint, ecartreltr smallint, ecartrelhdd real, ecartrelcdd real, ecartrelpav real, ecartrelpint real, ecartrelrr real, ecartrelpfl90 real, ecartrelrr1mm real, ecartrelpxcwd smallint, ecartrelpn20mm smallint, ecartrelpxcdd smallint, ecartrelhusav real, ecartreltx35 real, ecartrelpq90 real, ecartrelpq99 real, ecartrelrr99 real, ecartrelffav real, ecartrelff3 real, ecartrelffq98 real, ecartrelff98 real ); -- -- Name: pk_simulations_ecarts_relatifs_saison_2019; Type: CONSTRAINT; Schema: donnees2019; Owner: drias; Tablespace: -- ALTER TABLE ONLY simulations_ecarts_relatifs_saison ADD CONSTRAINT pk_simulations_ecarts_relatifs_saison_2019 PRIMARY KEY (idpoint, annee, saison, idreferentiel); -- -- Name: i_expe_annee_saison_simulations_ecarts_relatifs_saison_2019; Type: INDEX; Schema: donnees2019; Owner: drias; Tablespace: -- CREATE INDEX i_expe_annee_saison_simulations_ecarts_relatifs_saison_2019 ON simulations_ecarts_relatifs_saison USING btree (idreferentiel, annee, saison); -- -- Name: fk_id_point_ecarts_relatifs_saison_2019; Type: FK CONSTRAINT; Schema: donnees2019; Owner: drias -- ALTER TABLE ONLY simulations_ecarts_relatifs_saison ADD CONSTRAINT fk_id_point_ecarts_relatifs_saison_2019 FOREIGN KEY (idpoint) REFERENCES grilles.points_grille(id); -- -- Name: fk_id_referentiel_ecarts_relatifs_saison_2019; Type: FK CONSTRAINT; Schema: donnees2019; Owner: drias -- ALTER TABLE ONLY simulations_ecarts_relatifs_saison ADD CONSTRAINT fk_id_referentiel_ecarts_relatifs_saison_2019 FOREIGN KEY (idreferentiel) REFERENCES referentiel.referentiel_simulations(id); -- -- Name: fk_saison_ecarts_relatifs_saison_2019; Type: FK CONSTRAINT; Schema: donnees2019; Owner: drias -- ALTER TABLE ONLY simulations_ecarts_relatifs_saison ADD CONSTRAINT fk_saison_ecarts_relatifs_saison_2019 FOREIGN KEY (saison) REFERENCES donnees.liste_saison(code_saison); This is how is init / started the test instance: = $ initdb -D $MYDIR $ pg_ctl -D $MYDIR -o "-p 5432 -c unix_socket_directories=. -c shared_buffers=981MB -c work_mem=20MB -c maintenance_work_mem=98MB" start
Re: OOM killer while pg_restore
=?UTF-8?Q?Marc_Recht=c3=a9?= writes: > Le 03/03/2022 à 16:31, Tom Lane a écrit : >> Does memory consumption hold steady if you drop the FK constraints? > Actually the number of rows is 232735712. > Accordingly the RAM consumption would be x12 x3 = 7.8 GiB. > This is close to the 8,1g I reported earlier (actually it was closer to > 7.8 GB, due to GiB vs. GB confusion). > So there is no memory leak. > It took 16 hours on my box to reach that RAM consumption, and then the > COPY failed when checking the first FK (as the referenced table was empty). I'm guessing it was swapping like mad :-( We've long recommended dropping FK constraints during bulk data loads, and then re-establishing them later. That's a lot cheaper than retail validity checks, even without the memory-consumption angle. Ideally that sort of behavior would be automated, but nobody's gotten that done yet. (pg_restore does do it like that during a full restore, but not for a data-only restore, so I guess you were doing the latter.) regards, tom lane
