postgres backend process hang on " D " state

2022-05-29 Thread James Pang (chaolpan)
Hi,
   We have a performance test on Postgresql 13.4 on RHEL8.4 , just after 
connection storm in ( 952 new connections coming in 1 minute),  a lot of 
backends start on " D " state,  and when more sessions got disconnected, they 
do not exit successfully, instead became  "defunct".   No errors from 
postgresql.log , just after the connection storm, some pg_cron workers can not 
started either.  The server is a Virtual machine and no IO hang (actually) IO 
load is very low.   Could be a postgresql bug or an OS abnormal behavior?

top - 13:18:02 up 4 days,  6:59,  6 users,  load average: 308.68, 307.93, 307.40
Tasks: 1690 total,   1 running, 853 sleeping,   0 stopped, 836 zombie
%Cpu(s):  0.1 us,  0.8 sy,  0.0 ni, 99.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem : 128657.6 total,   1188.7 free,  52921.5 used,  74547.4 buff/cache
MiB Swap:   3072.0 total,   3066.7 free,  5.3 used.  74757.3 avail Mem

PID USER  PR  NIVIRTRESSHR S  %CPU  %MEM TIME+ COMMAND
1070325 postgres  20   0   45.6g  15356  11776 D   0.5   0.0  19:21.37 
postgres: xuser  xdb 192.x.x.132(33318) BIND
1070328 postgres  20   0   45.6g  16120  11660 D   0.5   0.0  19:21.96 
postgres: xuser  xdb 192.x.x.121(34372) BIND
1070329 postgres  20   0   45.6g  15380  11872 D   0.5   0.0  19:20.90 
postgres: xuser  xdb 192.x.x.126(41316) BIND
1070397 postgres  20   0   45.6g  14804  11604 D   0.5   0.0  19:23.57 
postgres: xuser  xdb 192.x.x.132(33324) BIND
1070434 postgres  20   0   45.6g  14928  11812 D   0.5   0.0  19:21.14 
postgres: xuser  xdb 192.x.x.129(57298) BIND
1070480 postgres  20   0   45.6g  14612  11660 D   0.5   0.0  19:19.88 
postgres: xuser  xdb 192.x.x.127(52424) BIND
1070508 postgres  20   0   45.6g  14928  11812 D   0.5   0.0  19:20.48 
postgres: xuser  xdb 192.x.x.127(52428) BIND
1070523 postgres  20   0   45.6g  14544  11716 D   0.5   0.0  19:22.53 
postgres: xuser  xdb 192.x.x.130(33678) BIND
1070647 postgres  20   0   45.6g  1  11660 D   0.5   0.0  19:24.36 
postgres: xuser  xdb 192.x.x.129(57316) BIND
1070648 postgres  20   0   45.6g  14352  11524 D   0.5   0.0  19:22.86 
postgres: xuser  xdb 192.x.x.133(48796) BIND
1070676 postgres  20   0   45.6g  14456  11660 D   0.5   0.0  19:21.92 
postgres: xuser  xdb 192.x.x.128(54614) BIND
1070724 postgres  20   0   45.6g  14352  11524 D   0.5   0.0  19:20.90 
postgres: xuser  xdb 192.x.x.126(41370) BIND
1070739 postgres  20   0   45.6g  14008  11412 D   0.5   0.0  19:22.69 
postgres: xuser  xdb 192.x.x.123(56164) BIND
1070786 postgres  20   0   45.6g  14352  11524 D   0.5   0.0  19:22.51 
postgres: xuser  xdb 192.x.x.121(34428) BIND
1070801 postgres  20   0   45.6g  13240  10688 D   0.5   0.0  19:22.19 
postgres: xuser  xdb 192.x.x.126(41382) BIND
1070815 postgres  20   0   45.6g  13240  10688 D   0.5   0.0  19:21.36 
postgres: xuser  xdb 192.x.x.53(55950) BIND
1070830 postgres  20   0   45.6g  13240  10688 D   0.5   0.0  19:23.80 
postgres: xuser  xdb 192.x.x.131(41704) BIND
1070841 postgres  20   0   45.6g  13304  10744 D   0.5   0.0  19:24.25 
postgres: xuser  xdb 192.x.x.131(41706) BIND
1070884 postgres  20   0   45.6g  13264  10688 D   0.5   0.0  19:20.61 
postgres: xuser  xdb 192.x.x.122(33734) BIND
1070903 postgres  20   0   45.6g  14456  11660 D   0.5   0.0  19:23.43 
postgres: xuser  xdb 192.x.x.132(33384) BIND
1070915 postgres  20   0   45.5g   7280   5372 D   0.5   0.0  19:21.20 
postgres: xuser  xdb 192.x.x.129(57350) initializing
1070941 postgres  20   0   45.5g   7280   5372 D   0.5   0.0  19:20.58 
postgres: xuser  xdb 192.x.x.124(35934) initializing
1070944 postgres  20   0   45.5g   7280   5372 D   0.5   0.0  19:18.57 
postgres: xuser  xdb 192.x.x.50(58964) initializing
1070963 postgres  20   0   45.5g   7280   5372 D   0.5   0.0  19:21.98 
postgres: xuser  xdb 192.x.x.132(33362) initializing
1070974 postgres  20   0   45.5g   7280   5372 D   0.5   0.0  19:22.76 
postgres: xuser  xdb 192.x.x.54(56774) initializing
1070986 postgres  20   0   45.5g   7284   5372 D   0.5   0.0  19:21.89 
postgres: xuser  xdb 192.x.x.132(33394) initializing
...

postgres 10711601951  0 May27 ?00:00:02 [postmaster] 
postgres 10711611951  0 May27 ?00:00:02 [postmaster] 
postgres 10711621951  0 May27 ?00:00:02 [postmaster] 
postgres 10711631951  0 May27 ?00:00:02 [postmaster] 
postgres 10711641951  0 May27 ?00:00:03 [postmaster] 
postgres 10711671951  0 May27 ?00:00:21 [postmaster] 
postgres 10711681951  0 May27 ?00:00:03 [postmaster] 
postgres 10711701951  0 May27 ?00:00:03 [postmaster] 
postgres 10711711951  0 May27 ?00:00:02 [postmaster] 
postgres 10711741951  0 May27 ?00:00:02 [postmaster] 
postgres 10711751951  0 May27 ?00:00:03 [postmaster] 
postgres 10711761951  0 May27 ?00:00:02 [postmaster] 
postgres 10711791951  0 May27 ?00:00:02 [postmaster] 
postgres 10711811951  0 May27 ?00:00:03

Re: postgres backend process hang on " D " state

2022-05-29 Thread Ranier Vilela
Em dom., 29 de mai. de 2022 às 10:20, James Pang (chaolpan) <
[email protected]> escreveu:

> Hi,
>
>We have a performance test on Postgresql 13.4 on RHEL8.4 ,
>
Hard to say with this info, but how is this " test", why not use the 13.7,
with all bugs fixes related?

regards,
Ranier Vilela

>


Re: postgres backend process hang on " D " state

2022-05-29 Thread Justin Pryzby
On Sun, May 29, 2022 at 01:20:12PM +, James Pang (chaolpan) wrote:
>We have a performance test on Postgresql 13.4 on RHEL8.4 , just after 
> connection storm in ( 952 new connections coming in 1 minute),  a lot of 
> backends start on " D " state,  and when more sessions got disconnected, they 
> do not exit successfully, instead became  "defunct".   No errors from 
> postgresql.log , just after the connection storm, some pg_cron workers can 
> not started either.  The server is a Virtual machine and no IO hang 
> (actually) IO load is very low.   Could be a postgresql bug or an OS abnormal 
> behavior?

What settings have you set ?
https://wiki.postgresql.org/wiki/Server_Configuration

What extensions do you have loaded?  \dx

Send the output of SELECT * FROM pg_stat_activity either as an attachment or in
\x mode?

What is your data dir ?  Is it on the VM's root filesystem or something else ?
Show the output of "mount".  Are there any kernel messages in /var/log/messages
or `dmesg` ?

How many relations are in your schema ?
Are you using temp tables ?
Long-running transactions ?

-- 
Justin




RE: postgres backend process hang on " D " state

2022-05-29 Thread James Pang (chaolpan)
1. extensions 
  shared_preload_libraries = 
'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'
2.  psql can not login now ,it hang there too, so can not check anything from 
pg_stats_* views
3.  one main app user and 2 schemas ,no long running transactions . 
4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no use 
root filesystem.
/dev/mapper/pgdatavg-pgdatalv 500G  230G  271G  46% /pgdata
/dev/mapper/pgdatavg-pgarchivelv  190G  1.5G  189G   1% /pgarchive
/dev/mapper/pgdatavg-pgwallv  100G   34G   67G  34% /pgwal

Regards,

James 
-Original Message-
From: Justin Pryzby  
Sent: Sunday, May 29, 2022 11:02 PM
To: James Pang (chaolpan) 
Cc: [email protected]
Subject: Re: postgres backend process hang on " D " state

On Sun, May 29, 2022 at 01:20:12PM +, James Pang (chaolpan) wrote:
>We have a performance test on Postgresql 13.4 on RHEL8.4 , just after 
> connection storm in ( 952 new connections coming in 1 minute),  a lot of 
> backends start on " D " state,  and when more sessions got disconnected, they 
> do not exit successfully, instead became  "defunct".   No errors from 
> postgresql.log , just after the connection storm, some pg_cron workers can 
> not started either.  The server is a Virtual machine and no IO hang 
> (actually) IO load is very low.   Could be a postgresql bug or an OS abnormal 
> behavior?

What settings have you set ?
https://wiki.postgresql.org/wiki/Server_Configuration

What extensions do you have loaded?  \dx

Send the output of SELECT * FROM pg_stat_activity either as an attachment or in 
\x mode?

What is your data dir ?  Is it on the VM's root filesystem or something else ?
Show the output of "mount".  Are there any kernel messages in /var/log/messages 
or `dmesg` ?

How many relations are in your schema ?
Are you using temp tables ?
Long-running transactions ?

--
Justin




Re: postgres backend process hang on " D " state

2022-05-29 Thread Justin Pryzby
On Mon, May 30, 2022 at 01:19:56AM +, James Pang (chaolpan) wrote:
> 1. extensions 
>   shared_preload_libraries = 
> 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'
> 2.  psql can not login now ,it hang there too, so can not check anything from 
> pg_stats_* views
> 3.  one main app user and 2 schemas ,no long running transactions . 
> 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no 
> use root filesystem.
> /dev/mapper/pgdatavg-pgdatalv 500G  230G  271G  46% /pgdata
> /dev/mapper/pgdatavg-pgarchivelv  190G  1.5G  189G   1% /pgarchive
> /dev/mapper/pgdatavg-pgwallv  100G   34G   67G  34% /pgwal

What are the LVM PVs ?  Is it a scsi/virt device ?  Or iscsi/drbd/???

I didn't hear back if there's any kernel errors.
Is the storage broken/stuck/disconnected ?
Can you run "time find /pgdata /pgarchive /pgwal -ls |wc" ?

Could you run "ps -u postgres -O wchan="

Can you strace one of the stuck backends ?

It sounds like you'll have to restart the service or VM (forcibly if necessary)
to resolve the immediate issue and then collect the other info, and leave a
"psql" open to try to (if the problem recurs) check pg_stat_activity and other
DB info.




Re: postgres backend process hang on " D " state

2022-05-29 Thread Tom Lane
"James Pang (chaolpan)"  writes:
> 1. extensions 
>   shared_preload_libraries = 
> 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'

Can you still reproduce this if you remove all of those?

regards, tom lane




RE: postgres backend process hang on " D " state

2022-05-29 Thread James Pang (chaolpan)
Update your questions

. time find /pgdata /pgarchive /pgwal -ls |wc
  82165  903817 8397391

real0m1.120s
user0m0.432s
sys 0m0.800s

ps -u postgres -O wchan=
PID  S TTY  TIME COMMAND
   1951 -D ?00:26:37 
/usr/pgsql-13/bin/postmaster -D /pgdata -c config_file=/pgdata/postgresql.conf
   2341 -S ?00:00:06 postgres: logger
   2361 -S ?00:01:02 postgres: checkpointer
   2362 -S ?00:00:27 postgres: background 
writer
   2363 -S ?00:00:59 postgres: walwriter
   2364 -S ?00:02:00 postgres: autovacuum 
launcher
   2365 -Z ?00:00:04 [postmaster] 
   2366 do_epoll_waitS ?00:13:30 postgres: stats 
collector
   2367 do_epoll_waitS ?00:00:18 postgres: pg_cron 
launcher
   2368 -S ?00:00:00 postgres: logical 
replication launcher
1053144 -Z ?00:05:36 [postmaster] 
1053319 -Z ?00:05:29 [postmaster] 
1053354 -Z ?00:05:53 [postmaster] 
1053394 -Z ?00:05:51 [postmaster] 
...
1064387 -Z ?00:05:13 [postmaster] 
1070257 -D ?00:24:23 postgres: test pbwd 
192.168.205.53(55886) BIND
1070258 -D ?00:24:24 postgres: test pbwd 
192.168.205.50(58910) BIND
1070259 -D ?00:24:22 postgres: test pbwd 
192.168.205.133(48754) SELECT
1070260 -Z ?00:05:02 [postmaster] 
...

Strace / gdb will hang there too for trace a process.

Regards,

James 

-Original Message-
From: Justin Pryzby  
Sent: Monday, May 30, 2022 10:20 AM
To: James Pang (chaolpan) 
Cc: [email protected]
Subject: Re: postgres backend process hang on " D " state

On Mon, May 30, 2022 at 01:19:56AM +, James Pang (chaolpan) wrote:
> 1. extensions 
>   shared_preload_libraries = 
> 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'
> 2.  psql can not login now ,it hang there too, so can not check 
> anything from pg_stats_* views 3.  one main app user and 2 schemas ,no long 
> running transactions .
> 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no 
> use root filesystem.
> /dev/mapper/pgdatavg-pgdatalv 500G  230G  271G  46% /pgdata
> /dev/mapper/pgdatavg-pgarchivelv  190G  1.5G  189G   1% /pgarchive
> /dev/mapper/pgdatavg-pgwallv  100G   34G   67G  34% /pgwal

What are the LVM PVs ?  Is it a scsi/virt device ?  Or iscsi/drbd/???

I didn't hear back if there's any kernel errors.
Is the storage broken/stuck/disconnected ?
Can you run "time find /pgdata /pgarchive /pgwal -ls |wc" ?

Could you run "ps -u postgres -O wchan="

Can you strace one of the stuck backends ?

It sounds like you'll have to restart the service or VM (forcibly if necessary) 
to resolve the immediate issue and then collect the other info, and leave a 
"psql" open to try to (if the problem recurs) check pg_stat_activity and other 
DB info.




RE: postgres backend process hang on " D " state

2022-05-29 Thread James Pang (chaolpan)
   Maybe any bugs from these extensions ?  I can try that removing all 
extensions, but we need these extensions. 

Thanks,

James

-Original Message-
From: Tom Lane  
Sent: Monday, May 30, 2022 10:21 AM
To: James Pang (chaolpan) 
Cc: Justin Pryzby ; [email protected]
Subject: Re: postgres backend process hang on " D " state

"James Pang (chaolpan)"  writes:
> 1. extensions 
>   shared_preload_libraries = 
> 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user'

Can you still reproduce this if you remove all of those?

regards, tom lane