OOM killer while pg_restore

2022-03-03 Thread Marc Rechté

Hello,

We have a pg_restore which fails due to RAM over-consumption of the 
corresponding PG backend, which ends-up with OOM killer.


The table has one PK, one index, and 3 FK constraints, active while 
restoring.
The dump contains over 200M rows for that table and is in custom format, 
which corresponds to 37 GB of total relation size in the original DB.


While importing, one can see the RSS + swap increasing linearly for the 
backend (executing the COPY)


On my machine (quite old PC), it failed after 16 hours, while the disk 
usage was reaching 26 GB and memory usage was 9.1g (RSS+swap)


If we do the same test, suppressing firstly the 5 constraints on the 
table, the restore takes less than 15 minutes !


This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines).

It there a memory leak or that is normal that a bacend process may 
exhaust the RAM to such an extent ?


Thanks

Regards


Re: OOM killer while pg_restore

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté  escreveu:

> Hello,
>
> We have a pg_restore which fails due to RAM over-consumption of the
> corresponding PG backend, which ends-up with OOM killer.
>
> The table has one PK, one index, and 3 FK constraints, active while
> restoring.
> The dump contains over 200M rows for that table and is in custom format,
> which corresponds to 37 GB of total relation size in the original DB.
>
> While importing, one can see the RSS + swap increasing linearly for the
> backend (executing the COPY)
>
> On my machine (quite old PC), it failed after 16 hours, while the disk
> usage was reaching 26 GB and memory usage was 9.1g (RSS+swap)
>
> If we do the same test, suppressing firstly the 5 constraints on the
> table, the restore takes less than 15 minutes !
>
> This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines).
>
> It there a memory leak or that is normal that a bacend process may exhaust
> the RAM to such an extent ?
>
Hi Marc,
Can you post the server logs?

regards,
Ranier Vilela


Re: OOM killer while pg_restore

2022-03-03 Thread Marc Rechté

Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté  escreveu:


Hello,

We have a pg_restore which fails due to RAM over-consumption of
the corresponding PG backend, which ends-up with OOM killer.

The table has one PK, one index, and 3 FK constraints, active
while restoring.
The dump contains over 200M rows for that table and is in custom
format, which corresponds to 37 GB of total relation size in the
original DB.

While importing, one can see the RSS + swap increasing linearly
for the backend (executing the COPY)

On my machine (quite old PC), it failed after 16 hours, while the
disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap)

If we do the same test, suppressing firstly the 5 constraints on
the table, the restore takes less than 15 minutes !

This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines).

It there a memory leak or that is normal that a bacend process may
exhaust the RAM to such an extent ?

Hi Marc,
Can you post the server logs?

regards,
Ranier Vilela


Will it help ?

2022-02-25 12:01:29.306 GMT [1468:24] user=,db=,app=,client= LOG:  
server process (PID 358995) was terminated by signal 9: Killed
2022-02-25 12:01:29.306 GMT [1468:25] user=,db=,app=,client= DETAIL:  
Failed process was running: COPY simulations_ecarts_relatifs_saison 
(idpoint, annee, saison, idreferentiel, ecartreltav, ecartreltnav, 
ecartreltxav, ecartreltrav, ecartreltxq90, ecartreltxq10, ecartreltnq10, 
ecartreltnq90, ecartreltxnd, ecartreltnnd, ecartreltnht, ecartreltxhwd, 
ecartreltncwd, ecartreltnfd, ecartreltxfd, ecartrelsd, ecartreltr, 
ecartrelhdd, ecartrelcdd, ecartrelpav, ecartrelpint, ecartrelrr, 
ecartrelpfl90, ecartrelrr1mm, ecartrelpxcwd, ecartrelpn20mm, 
ecartrelpxcdd, ecartrelhusav, ecartreltx35, ecartrelpq90, ecartrelpq99, 
ecartrelrr99, ecartrelffav, ecartrelff3, ecartrelffq98, ecartrelff98) 
FROM stdin;


2022-02-25 12:01:29.306 GMT [1468:26] user=,db=,app=,client= LOG: 
terminating any other active server processes
2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all 
server processes terminated; reinitializing
2022-02-25 12:01:29.311 GMT [1468:27] user=,db=,app=,client= LOG: all 
server processes terminated; reinitializing
2022-02-25 12:01:29.326 GMT [360309:1] user=,db=,app=,client= LOG:  
database system was interrupted; last known up at 2022-02-25 12:01:12 GMT
2022-02-25 12:01:29.362 GMT [360310:1] 
user=[unknown],db=[unknown],app=[unknown],client=[local] LOG: connection 
received: host=[local]
2022-02-25 12:01:29.363 GMT [360310:2] 
user=postgres,db=drias,app=[unknown],client=[local] FATAL:  the database 
system is in recovery mode
2022-02-25 12:01:29.365 GMT [360309:2] user=,db=,app=,client= LOG:  
database system was not properly shut down; automatic recovery in progress
2022-02-25 12:01:29.367 GMT [360309:3] user=,db=,app=,client= LOG:  redo 
starts at C3/1E0D31F0
2022-02-25 12:01:40.845 GMT [360309:4] user=,db=,app=,client= LOG:  redo 
done at C3/6174BC00 system usage: CPU: user: 4.15 s, system: 1.40 s, 
elapsed: 11.47 s
2022-02-25 12:01:40.847 GMT [360309:5] user=,db=,app=,client= LOG:  
checkpoint starting: end-of-recovery immediate
2022-02-25 12:01:41.806 GMT [360309:6] user=,db=,app=,client= LOG:  
checkpoint complete: wrote 125566 buffers (100.0%); 0 WAL file(s) added, 
54 removed, 13 recycled; write=0.915 s, sync=0.001 s, total=0.960 s; 
sync files=10, longest=0.001 s, average=0.001 s; distance=1104355 kB, 
estimate=1104355 kB
2022-02-25 12:01:41.810 GMT [1468:28] user=,db=,app=,client= LOG: 
database system is ready to accept connections






Re: OOM killer while pg_restore

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 09:19, Marc Rechté  escreveu:

> Em qui., 3 de mar. de 2022 às 05:59, Marc Rechté 
> escreveu:
> >
> > Hello,
> >
> > We have a pg_restore which fails due to RAM over-consumption of
> > the corresponding PG backend, which ends-up with OOM killer.
> >
> > The table has one PK, one index, and 3 FK constraints, active
> > while restoring.
> > The dump contains over 200M rows for that table and is in custom
> > format, which corresponds to 37 GB of total relation size in the
> > original DB.
> >
> > While importing, one can see the RSS + swap increasing linearly
> > for the backend (executing the COPY)
> >
> > On my machine (quite old PC), it failed after 16 hours, while the
> > disk usage was reaching 26 GB and memory usage was 9.1g (RSS+swap)
> >
> > If we do the same test, suppressing firstly the 5 constraints on
> > the table, the restore takes less than 15 minutes !
> >
> > This was tested on both PG 14.2 and PG 13.6 (linux 64-bit machines).
> >
> > It there a memory leak or that is normal that a bacend process may
> > exhaust the RAM to such an extent ?
> >
> > Hi Marc,
> > Can you post the server logs?
> >
> > regards,
> > Ranier Vilela
>
> Will it help ?
>
Show some direction.


> 2022-02-25 12:01:29.306 GMT [1468:24] user=,db=,app=,client= LOG:
> server process (PID 358995) was terminated by signal 9: Killed
> 2022-02-25 12:01:29.306 GMT [1468:25] user=,db=,app=,client= DETAIL:
> Failed process was running: COPY simulations_ecarts_relatifs_saison
> (idpoint, annee, saison, idreferentiel, ecartreltav, ecartreltnav,
> ecartreltxav, ecartreltrav, ecartreltxq90, ecartreltxq10, ecartreltnq10,
> ecartreltnq90, ecartreltxnd, ecartreltnnd, ecartreltnht, ecartreltxhwd,
> ecartreltncwd, ecartreltnfd, ecartreltxfd, ecartrelsd, ecartreltr,
> ecartrelhdd, ecartrelcdd, ecartrelpav, ecartrelpint, ecartrelrr,
> ecartrelpfl90, ecartrelrr1mm, ecartrelpxcwd, ecartrelpn20mm,
> ecartrelpxcdd, ecartrelhusav, ecartreltx35, ecartrelpq90, ecartrelpq99,
> ecartrelrr99, ecartrelffav, ecartrelff3, ecartrelffq98, ecartrelff98)
> FROM stdin;
>
COPY leak?

regards,
Ranier Vilela


Re: OOM killer while pg_restore

2022-03-03 Thread Justin Pryzby
On Thu, Mar 03, 2022 at 09:59:03AM +0100, Marc Rechté wrote:
> Hello,
> 
> We have a pg_restore which fails due to RAM over-consumption of the
> corresponding PG backend, which ends-up with OOM killer.
> 
> The table has one PK, one index, and 3 FK constraints, active while restoring.

Send the schema for the table, index, and constraints (\d in psql).

What are the server settings ?
https://wiki.postgresql.org/wiki/Server_Configuration

What OS/version ?

> The dump contains over 200M rows for that table and is in custom format,
> which corresponds to 37 GB of total relation size in the original DB.
> 
> While importing, one can see the RSS + swap increasing linearly for the
> backend (executing the COPY)
> 
> On my machine (quite old PC), it failed after 16 hours, while the disk usage
> was reaching 26 GB and memory usage was 9.1g (RSS+swap)




RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread [email protected]


   >  -Original Message-
   >  From: Justin Pryzby 
   >  Sent: Tuesday, March 1, 2022 14:27
   >  To: [email protected]
   >  Cc: [email protected]
   >  Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)
   >  
   >  On Tue, Mar 01, 2022 at 04:28:31PM +, [email protected]
   >  wrote:
   >  > Now, there is an additional component I think... Storage is on an array
   >  and I am not getting a clear answer as to where it is 😊 Is it possible 
that
   >  something is happening at the storage layer? Could that be reported as a
   >  network issue vs a storage issue for Postgres?
   >  
   >  No.  If there were an error with storage, it'd be reported as a local 
error,
   >  and the query would fail, rather than failing with client-server
   >  communication.
   >  
   >  > Also, both machines are actually VMs. I forgot to mention that and not
   >  sure if that's relevant.
   >  
   >  Are they running on the same hypervisor ?  Is that hyperv ?
   >  Lacking other good hypotheses, that does seem relevant.
   >  
   >  --
   >  Justin

Issue happened again last night. I did implement your recommendations but it 
didn't seem to prevent the issue:

tcp_keepalives_idle=9   # TCP_KEEPIDLE, in seconds;
# 0 selects the system default
tcp_keepalives_interval=9   # TCP_KEEPINTVL, in seconds;
# 0 selects the system default
tcp_keepalives_count=0  # TCP_KEEPCNT;
# 0 selects the system default
#tcp_user_timeout = 0   # TCP_USER_TIMEOUT, in milliseconds;
# 0 selects the system default

On the client application, the exceptions are:

2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 
2017-05-16 17.18.02 by buildguy) : Unexpected error
2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from 
2017-05-16 17.18.02 by buildguy) : 
org.pentaho.di.core.exception.KettleStepException: 
2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop 
because of:
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the 
backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2022/03/03 01:04:56 - Upsert2.0 -at java.lang.Thread.run(Thread.java:745)
2022/03/03 01:04:56 - Upsert2.0 - Caused by: 
org.pentaho.di.core.exception.KettleDatabaseException: 
2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to the 
backend.
2022/03/03 01:04:56 - Upsert2.0 - 
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1245)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1233)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
2022/03/03 01:04:56 - Upsert2.0 -... 2 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: org.postgresql.util.PSQLException: 
An I/O error occurred while sending to the backend.
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.pentaho.di.core.database.Database.insertRow(Database.java:1288)
2022/03/03 01:04:56 - Upsert2.0 -... 6 more
2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException: 
Connection reset
2022/03/03 01:04:56 - Upsert2.0 -at 
java.net.SocketInputStream.read(SocketInputStream.java:209)
2022/03/03 01:04:56 - Upsert2.0 -at 
java.net.SocketInputStream.read(SocketInputStream.java:141)
2022/03/03 01:04:56 - Upsert2.0 -at 
org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:161)
2022/03/03 01:04:56 - Upsert2.0 -at

Re: OOM killer while pg_restore

2022-03-03 Thread Tom Lane
=?UTF-8?Q?Marc_Recht=c3=a9?=  writes:
> We have a pg_restore which fails due to RAM over-consumption of the 
> corresponding PG backend, which ends-up with OOM killer.
> The table has one PK, one index, and 3 FK constraints, active while 
> restoring.
> The dump contains over 200M rows for that table and is in custom format, 
> which corresponds to 37 GB of total relation size in the original DB.

The FKs would result in queueing row trigger events, which would occupy
some memory.  But those should only need ~12 bytes per FK per row,
which works out to less than 10GB for this number of rows, so it may
be that you've hit something else that we would consider a leak.

Does memory consumption hold steady if you drop the FK constraints?

If not, as others have noted, we'd need more info to investigate
this.  The leak is probably independent of the specific data in
the table, so maybe you could make a small self-contained example
using a script to generate dummy data.

regards, tom lane




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 11:55, [email protected] <
[email protected]> escreveu:

>
>
>>  -Original Message-
>>  From: Justin Pryzby 
>>  Sent: Tuesday, March 1, 2022 14:27
>>  To: [email protected]
>>  Cc: [email protected]
>>  Subject: Re: An I/O error occurred while sending to the backend (PG
> 13.4)
>>
>>  On Tue, Mar 01, 2022 at 04:28:31PM +, [email protected]
>>  wrote:
>>  > Now, there is an additional component I think... Storage is on an
> array
>>  and I am not getting a clear answer as to where it is 😊 Is it
> possible that
>>  something is happening at the storage layer? Could that be reported
> as a
>>  network issue vs a storage issue for Postgres?
>>
>>  No.  If there were an error with storage, it'd be reported as a
> local error,
>>  and the query would fail, rather than failing with client-server
>>  communication.
>>
>>  > Also, both machines are actually VMs. I forgot to mention that and
> not
>>  sure if that's relevant.
>>
>>  Are they running on the same hypervisor ?  Is that hyperv ?
>>  Lacking other good hypotheses, that does seem relevant.
>>
>>  --
>>  Justin
>
> Issue happened again last night. I did implement your recommendations but
> it didn't seem to prevent the issue:
>
> tcp_keepalives_idle=9   # TCP_KEEPIDLE, in seconds;
> # 0 selects the system default
> tcp_keepalives_interval=9   # TCP_KEEPINTVL, in seconds;
> # 0 selects the system default
> tcp_keepalives_count=0  # TCP_KEEPCNT;
> # 0 selects the system default
> #tcp_user_timeout = 0   # TCP_USER_TIMEOUT, in milliseconds;
> # 0 selects the system default
>
> On the client application, the exceptions are:
>
> 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from
> 2017-05-16 17.18.02 by buildguy) : Unexpected error
> 2022/03/03 01:04:56 - Upsert2.0 - ERROR (version 7.1.0.0-12, build 1 from
> 2017-05-16 17.18.02 by buildguy) :
> org.pentaho.di.core.exception.KettleStepException:
> 2022/03/03 01:04:56 - Upsert2.0 - Error in step, asking everyone to stop
> because of:
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
> 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:313)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> java.lang.Thread.run(Thread.java:745)
> 2022/03/03 01:04:56 - Upsert2.0 - Caused by:
> org.pentaho.di.core.exception.KettleDatabaseException:
> 2022/03/03 01:04:56 - Upsert2.0 - Error inserting/updating row
> 2022/03/03 01:04:56 - Upsert2.0 - An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1321)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1245)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1233)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
> 2022/03/03 01:04:56 - Upsert2.0 -... 2 more
> 2022/03/03 01:04:56 - Upsert2.0 - Caused by:
> org.postgresql.util.PSQLException: An I/O error occurred while sending to
> the backend.
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:382)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:490)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgStatement.execute(PgStatement.java:408)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:166)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.postgresql.jdbc.PgPreparedStatement.executeUpdate(PgPreparedStatement.java:134)
> 2022/03/03 01:04:56 - Upsert2.0 -at
> org.pentaho.di.core.database.Database.insertRow(Database.java:1288)
> 2022/03/03 01:04:56 - Upsert2.0 -... 6 more
> 2022/03/03 01:04:56 - Upsert2.0 - Caused by: java.net.SocketException:
> Connection reset
> 2022/03/03 01:04:56 - Upsert2.0 -at
> java.net.SocketInputStream.read(SocketInputStream.java:209)
> 2022/03/03 

Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Justin Pryzby
On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> Sorry, but this is much more on the client side.

The client is reporting the problem, as is the server.

> Following the logs, it is understood that the client is dropping the
> connection.

The logs show that the client's connection *was* dropped.
And on the server, the same.

> So most likely the error could be from Pentaho or JDBC.
> 
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> " This *SocketException* occurs on the server-side when the client closed
> the socket connection before the response could be returned over the
> socket."
> 
> I suggest moving this thread to the Pentaho or JDBC support.

We don't know the source of the problem.  I still doubt it's in postgres, but I
don't think it's helpful to blame the client, just because the client reported
the problem.  If the server were to disconnect abruptly, I'd expect the client
to report that, too.

Laurent would just have to start the conversation over (and probably collect
the same diagnostic information anyway).  The client projects could blame
postgres with as much rationale as there is for us to blame the client.

Please don't add confusion here.  I made suggestions for how to collect more
information to better understand the source of the problem, and there's
probably not much else to say without that.

-- 
Justin




Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby 
escreveu:

> On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> > Sorry, but this is much more on the client side.
>
> The client is reporting the problem, as is the server.
>
 Are you read the server log?
" 2022-03-03 01:04:40 EST [21228] LOG:  could not receive data from client:
An existing connection was forcibly closed by the remote host.
2022-03-03 01:04:40 EST [21228] LOG:  unexpected EOF on client connection
with an open transaction"


> > Following the logs, it is understood that the client is dropping the
> > connection.
>
> The logs show that the client's connection *was* dropped.
> And on the server, the same.
>
No, the log server shows that the client dropped the connection.


>
> > So most likely the error could be from Pentaho or JDBC.
> >
> >
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> > " This *SocketException* occurs on the server-side when the client closed
> > the socket connection before the response could be returned over the
> > socket."
> >
> > I suggest moving this thread to the Pentaho or JDBC support.
>
> We don't know the source of the problem.

Yeah, but it is much more likely to be on the client.


>   I still doubt it's in postgres,

Everything indicates not.

but I
> don't think it's helpful to blame the client, just because the client
> reported
> the problem.  If the server were to disconnect abruptly, I'd expect the
> client
> to report that, too.
>
> Laurent would just have to start the conversation over (and probably
> collect
> the same diagnostic information anyway).  The client projects could blame
> postgres with as much rationale as there is for us to blame the client.
>
> Please don't add confusion here.

I just suggested, this is not an order.

regards,
Ranier Vilela


RE: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread [email protected]
I am also starting to feel that the issue being on the database’s side is less 
and less likely. There is something happening in between, or possibly on the 
client.

Ranier, the only reason I was focusing on this at the PG level is that this 
issue started to show up several months ago shortly after I updated to PG13 
from PG11. Had run PG11 for 2 years without ever seeing that issue at all. The 
ETL itself hasn’t changed either, except for upgrading the JDBC driver… But I 
did revert back to an older JDBC driver and the issue still did occur 
eventually.

Of course, other things could have changed in the client’s IT infrastructure 
that I am not aware of, so I am pushing that angle as well more aggressively 
now. I am also pushing for WireShark to monitor the network more closely. Stay 
tuned!

Thank you so much all for your support but at this time, I think the ball is in 
my camp and working out with it on some plan.

Thank you,
Laurent.


From: Ranier Vilela 
Sent: Thursday, March 3, 2022 11:57
To: Justin Pryzby 
Cc: [email protected]; [email protected]
Subject: Re: An I/O error occurred while sending to the backend (PG 13.4)


Em qui., 3 de mar. de 2022 às 13:46, Justin Pryzby 
mailto:[email protected]>> escreveu:
On Thu, Mar 03, 2022 at 01:33:08PM -0300, Ranier Vilela wrote:
> Sorry, but this is much more on the client side.

The client is reporting the problem, as is the server.
 Are you read the server log?
" 2022-03-03 01:04:40 EST [21228] LOG:  could not receive data from client: An 
existing connection was forcibly closed by the remote host.
2022-03-03 01:04:40 EST [21228] LOG:  unexpected EOF on client connection with 
an open transaction"


> Following the logs, it is understood that the client is dropping the
> connection.

The logs show that the client's connection *was* dropped.
And on the server, the same.
No, the log server shows that the client dropped the connection.


> So most likely the error could be from Pentaho or JDBC.
>
> https://www.geeksforgeeks.org/java-net-socketexception-in-java-with-examples/
> " This *SocketException* occurs on the server-side when the client closed
> the socket connection before the response could be returned over the
> socket."
>
> I suggest moving this thread to the Pentaho or JDBC support.

We don't know the source of the problem.
Yeah, but it is much more likely to be on the client.

  I still doubt it's in postgres,
Everything indicates not.

but I
don't think it's helpful to blame the client, just because the client reported
the problem.  If the server were to disconnect abruptly, I'd expect the client
to report that, too.

Laurent would just have to start the conversation over (and probably collect
the same diagnostic information anyway).  The client projects could blame
postgres with as much rationale as there is for us to blame the client.

Please don't add confusion here.
I just suggested, this is not an order.

regards,
Ranier Vilela


Re: An I/O error occurred while sending to the backend (PG 13.4)

2022-03-03 Thread Ranier Vilela
Em qui., 3 de mar. de 2022 às 15:19, [email protected] <
[email protected]> escreveu:

> I am also starting to feel that the issue being on the database’s side is
> less and less likely. There is something happening in between, or possibly
> on the client.
>
>
>
> Ranier, the only reason I was focusing on this at the PG level is that
> this issue started to show up several months ago shortly after I updated to
> PG13 from PG11. Had run PG11 for 2 years without ever seeing that issue at
> all. The ETL itself hasn’t changed either, except for upgrading the JDBC
> driver… But I did revert back to an older JDBC driver and the issue still
> did occur eventually.
>
>
>
> Of course, other things could have changed in the client’s IT
> infrastructure that I am not aware of, so I am pushing that angle as well
> more aggressively now. I am also pushing for WireShark to monitor the
> network more closely. Stay tuned!
>
>
>
> Thank you so much all for your support but at this time, I think the ball
> is in my camp and working out with it on some plan.
>
You are welcome.

regards,
Ranier Vilela


Re: OOM killer while pg_restore

2022-03-03 Thread Marc Rechté

Le 03/03/2022 à 16:31, Tom Lane a écrit :

=?UTF-8?Q?Marc_Recht=c3=a9?=  writes:

We have a pg_restore which fails due to RAM over-consumption of the
corresponding PG backend, which ends-up with OOM killer.
The table has one PK, one index, and 3 FK constraints, active while
restoring.
The dump contains over 200M rows for that table and is in custom format,
which corresponds to 37 GB of total relation size in the original DB.

The FKs would result in queueing row trigger events, which would occupy
some memory.  But those should only need ~12 bytes per FK per row,
which works out to less than 10GB for this number of rows, so it may
be that you've hit something else that we would consider a leak.

Does memory consumption hold steady if you drop the FK constraints?

If not, as others have noted, we'd need more info to investigate
this.  The leak is probably independent of the specific data in
the table, so maybe you could make a small self-contained example
using a script to generate dummy data.

regards, tom lane



Actually the number of rows is 232735712.

Accordingly the RAM consumption would be x12 x3 = 7.8 GiB.

This is close to the 8,1g I reported earlier (actually it was closer to 
7.8 GB, due to GiB vs. GB confusion).


So there is no memory leak.

It took 16 hours on my box to reach that RAM consumption, and then the 
COPY failed when checking the first FK (as the referenced table was empty).


I dropped the FK, index, and 3 FK constraints and started over the 
pg_restore:


11 minutes to load the table (I did not have time to note RAM consumption)

I then created the PK and index:

24 minutes

For FK, I don't know because the referenced table are empty (but I'll be 
able to test next week, if deemed necessary).


16 hours vs. 35 minutes to reach the same state.

This is the data structure:

=

--
-- Name: simulations_ecarts_relatifs_saison; Type: TABLE; Schema: 
donnees2019; Owner: drias; Tablespace:

--

CREATE TABLE simulations_ecarts_relatifs_saison (
    idpoint integer NOT NULL,
    annee integer NOT NULL,
    saison integer NOT NULL,
    idreferentiel integer NOT NULL,
    ecartreltav real,
    ecartreltnav real,
    ecartreltxav real,
    ecartreltrav real,
    ecartreltxq90 real,
    ecartreltxq10 real,
    ecartreltnq10 real,
    ecartreltnq90 real,
    ecartreltxnd smallint,
    ecartreltnnd smallint,
    ecartreltnht smallint,
    ecartreltxhwd smallint,
    ecartreltncwd smallint,
    ecartreltnfd smallint,
    ecartreltxfd smallint,
    ecartrelsd smallint,
    ecartreltr smallint,
    ecartrelhdd real,
    ecartrelcdd real,
    ecartrelpav real,
    ecartrelpint real,
    ecartrelrr real,
    ecartrelpfl90 real,
    ecartrelrr1mm real,
    ecartrelpxcwd smallint,
    ecartrelpn20mm smallint,
    ecartrelpxcdd smallint,
    ecartrelhusav real,
    ecartreltx35 real,
    ecartrelpq90 real,
    ecartrelpq99 real,
    ecartrelrr99 real,
    ecartrelffav real,
    ecartrelff3 real,
    ecartrelffq98 real,
    ecartrelff98 real
);

--
-- Name: pk_simulations_ecarts_relatifs_saison_2019; Type: CONSTRAINT; 
Schema: donnees2019; Owner: drias; Tablespace:

--

ALTER TABLE ONLY simulations_ecarts_relatifs_saison
    ADD CONSTRAINT pk_simulations_ecarts_relatifs_saison_2019 PRIMARY 
KEY (idpoint, annee, saison, idreferentiel);


--
-- Name: i_expe_annee_saison_simulations_ecarts_relatifs_saison_2019; 
Type: INDEX; Schema: donnees2019; Owner: drias; Tablespace:

--

CREATE INDEX i_expe_annee_saison_simulations_ecarts_relatifs_saison_2019 
ON simulations_ecarts_relatifs_saison USING btree (idreferentiel, annee, 
saison);


--
-- Name: fk_id_point_ecarts_relatifs_saison_2019; Type: FK CONSTRAINT; 
Schema: donnees2019; Owner: drias

--

ALTER TABLE ONLY simulations_ecarts_relatifs_saison
    ADD CONSTRAINT fk_id_point_ecarts_relatifs_saison_2019 FOREIGN KEY 
(idpoint) REFERENCES grilles.points_grille(id);



--
-- Name: fk_id_referentiel_ecarts_relatifs_saison_2019; Type: FK 
CONSTRAINT; Schema: donnees2019; Owner: drias

--

ALTER TABLE ONLY simulations_ecarts_relatifs_saison
    ADD CONSTRAINT fk_id_referentiel_ecarts_relatifs_saison_2019 
FOREIGN KEY (idreferentiel) REFERENCES 
referentiel.referentiel_simulations(id);


--
-- Name: fk_saison_ecarts_relatifs_saison_2019; Type: FK CONSTRAINT; 
Schema: donnees2019; Owner: drias

--

ALTER TABLE ONLY simulations_ecarts_relatifs_saison
    ADD CONSTRAINT fk_saison_ecarts_relatifs_saison_2019 FOREIGN KEY 
(saison) REFERENCES donnees.liste_saison(code_saison);


This is how is init / started the test instance:

=

$ initdb -D $MYDIR
$ pg_ctl -D $MYDIR -o "-p 5432 -c unix_socket_directories=. -c 
shared_buffers=981MB -c work_mem=20MB -c maintenance_work_mem=98MB" start






Re: OOM killer while pg_restore

2022-03-03 Thread Tom Lane
=?UTF-8?Q?Marc_Recht=c3=a9?=  writes:
> Le 03/03/2022 à 16:31, Tom Lane a écrit :
>> Does memory consumption hold steady if you drop the FK constraints?

> Actually the number of rows is 232735712.
> Accordingly the RAM consumption would be x12 x3 = 7.8 GiB.
> This is close to the 8,1g I reported earlier (actually it was closer to 
> 7.8 GB, due to GiB vs. GB confusion).

> So there is no memory leak.

> It took 16 hours on my box to reach that RAM consumption, and then the 
> COPY failed when checking the first FK (as the referenced table was empty).

I'm guessing it was swapping like mad :-(

We've long recommended dropping FK constraints during bulk data loads,
and then re-establishing them later.  That's a lot cheaper than retail
validity checks, even without the memory-consumption angle.  Ideally
that sort of behavior would be automated, but nobody's gotten that
done yet.  (pg_restore does do it like that during a full restore,
but not for a data-only restore, so I guess you were doing the latter.)

regards, tom lane