postgres backend process hang on " D " state
Hi, We have a performance test on Postgresql 13.4 on RHEL8.4 , just after connection storm in ( 952 new connections coming in 1 minute), a lot of backends start on " D " state, and when more sessions got disconnected, they do not exit successfully, instead became "defunct". No errors from postgresql.log , just after the connection storm, some pg_cron workers can not started either. The server is a Virtual machine and no IO hang (actually) IO load is very low. Could be a postgresql bug or an OS abnormal behavior? top - 13:18:02 up 4 days, 6:59, 6 users, load average: 308.68, 307.93, 307.40 Tasks: 1690 total, 1 running, 853 sleeping, 0 stopped, 836 zombie %Cpu(s): 0.1 us, 0.8 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 128657.6 total, 1188.7 free, 52921.5 used, 74547.4 buff/cache MiB Swap: 3072.0 total, 3066.7 free, 5.3 used. 74757.3 avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1070325 postgres 20 0 45.6g 15356 11776 D 0.5 0.0 19:21.37 postgres: xuser xdb 192.x.x.132(33318) BIND 1070328 postgres 20 0 45.6g 16120 11660 D 0.5 0.0 19:21.96 postgres: xuser xdb 192.x.x.121(34372) BIND 1070329 postgres 20 0 45.6g 15380 11872 D 0.5 0.0 19:20.90 postgres: xuser xdb 192.x.x.126(41316) BIND 1070397 postgres 20 0 45.6g 14804 11604 D 0.5 0.0 19:23.57 postgres: xuser xdb 192.x.x.132(33324) BIND 1070434 postgres 20 0 45.6g 14928 11812 D 0.5 0.0 19:21.14 postgres: xuser xdb 192.x.x.129(57298) BIND 1070480 postgres 20 0 45.6g 14612 11660 D 0.5 0.0 19:19.88 postgres: xuser xdb 192.x.x.127(52424) BIND 1070508 postgres 20 0 45.6g 14928 11812 D 0.5 0.0 19:20.48 postgres: xuser xdb 192.x.x.127(52428) BIND 1070523 postgres 20 0 45.6g 14544 11716 D 0.5 0.0 19:22.53 postgres: xuser xdb 192.x.x.130(33678) BIND 1070647 postgres 20 0 45.6g 1 11660 D 0.5 0.0 19:24.36 postgres: xuser xdb 192.x.x.129(57316) BIND 1070648 postgres 20 0 45.6g 14352 11524 D 0.5 0.0 19:22.86 postgres: xuser xdb 192.x.x.133(48796) BIND 1070676 postgres 20 0 45.6g 14456 11660 D 0.5 0.0 19:21.92 postgres: xuser xdb 192.x.x.128(54614) BIND 1070724 postgres 20 0 45.6g 14352 11524 D 0.5 0.0 19:20.90 postgres: xuser xdb 192.x.x.126(41370) BIND 1070739 postgres 20 0 45.6g 14008 11412 D 0.5 0.0 19:22.69 postgres: xuser xdb 192.x.x.123(56164) BIND 1070786 postgres 20 0 45.6g 14352 11524 D 0.5 0.0 19:22.51 postgres: xuser xdb 192.x.x.121(34428) BIND 1070801 postgres 20 0 45.6g 13240 10688 D 0.5 0.0 19:22.19 postgres: xuser xdb 192.x.x.126(41382) BIND 1070815 postgres 20 0 45.6g 13240 10688 D 0.5 0.0 19:21.36 postgres: xuser xdb 192.x.x.53(55950) BIND 1070830 postgres 20 0 45.6g 13240 10688 D 0.5 0.0 19:23.80 postgres: xuser xdb 192.x.x.131(41704) BIND 1070841 postgres 20 0 45.6g 13304 10744 D 0.5 0.0 19:24.25 postgres: xuser xdb 192.x.x.131(41706) BIND 1070884 postgres 20 0 45.6g 13264 10688 D 0.5 0.0 19:20.61 postgres: xuser xdb 192.x.x.122(33734) BIND 1070903 postgres 20 0 45.6g 14456 11660 D 0.5 0.0 19:23.43 postgres: xuser xdb 192.x.x.132(33384) BIND 1070915 postgres 20 0 45.5g 7280 5372 D 0.5 0.0 19:21.20 postgres: xuser xdb 192.x.x.129(57350) initializing 1070941 postgres 20 0 45.5g 7280 5372 D 0.5 0.0 19:20.58 postgres: xuser xdb 192.x.x.124(35934) initializing 1070944 postgres 20 0 45.5g 7280 5372 D 0.5 0.0 19:18.57 postgres: xuser xdb 192.x.x.50(58964) initializing 1070963 postgres 20 0 45.5g 7280 5372 D 0.5 0.0 19:21.98 postgres: xuser xdb 192.x.x.132(33362) initializing 1070974 postgres 20 0 45.5g 7280 5372 D 0.5 0.0 19:22.76 postgres: xuser xdb 192.x.x.54(56774) initializing 1070986 postgres 20 0 45.5g 7284 5372 D 0.5 0.0 19:21.89 postgres: xuser xdb 192.x.x.132(33394) initializing ... postgres 10711601951 0 May27 ?00:00:02 [postmaster] postgres 10711611951 0 May27 ?00:00:02 [postmaster] postgres 10711621951 0 May27 ?00:00:02 [postmaster] postgres 10711631951 0 May27 ?00:00:02 [postmaster] postgres 10711641951 0 May27 ?00:00:03 [postmaster] postgres 10711671951 0 May27 ?00:00:21 [postmaster] postgres 10711681951 0 May27 ?00:00:03 [postmaster] postgres 10711701951 0 May27 ?00:00:03 [postmaster] postgres 10711711951 0 May27 ?00:00:02 [postmaster] postgres 10711741951 0 May27 ?00:00:02 [postmaster] postgres 10711751951 0 May27 ?00:00:03 [postmaster] postgres 10711761951 0 May27 ?00:00:02 [postmaster] postgres 10711791951 0 May27 ?00:00:02 [postmaster] postgres 10711811951 0 May27 ?00:00:03
Re: postgres backend process hang on " D " state
Em dom., 29 de mai. de 2022 às 10:20, James Pang (chaolpan) < [email protected]> escreveu: > Hi, > >We have a performance test on Postgresql 13.4 on RHEL8.4 , > Hard to say with this info, but how is this " test", why not use the 13.7, with all bugs fixes related? regards, Ranier Vilela >
Re: postgres backend process hang on " D " state
On Sun, May 29, 2022 at 01:20:12PM +, James Pang (chaolpan) wrote: >We have a performance test on Postgresql 13.4 on RHEL8.4 , just after > connection storm in ( 952 new connections coming in 1 minute), a lot of > backends start on " D " state, and when more sessions got disconnected, they > do not exit successfully, instead became "defunct". No errors from > postgresql.log , just after the connection storm, some pg_cron workers can > not started either. The server is a Virtual machine and no IO hang > (actually) IO load is very low. Could be a postgresql bug or an OS abnormal > behavior? What settings have you set ? https://wiki.postgresql.org/wiki/Server_Configuration What extensions do you have loaded? \dx Send the output of SELECT * FROM pg_stat_activity either as an attachment or in \x mode? What is your data dir ? Is it on the VM's root filesystem or something else ? Show the output of "mount". Are there any kernel messages in /var/log/messages or `dmesg` ? How many relations are in your schema ? Are you using temp tables ? Long-running transactions ? -- Justin
RE: postgres backend process hang on " D " state
1. extensions shared_preload_libraries = 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user' 2. psql can not login now ,it hang there too, so can not check anything from pg_stats_* views 3. one main app user and 2 schemas ,no long running transactions . 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no use root filesystem. /dev/mapper/pgdatavg-pgdatalv 500G 230G 271G 46% /pgdata /dev/mapper/pgdatavg-pgarchivelv 190G 1.5G 189G 1% /pgarchive /dev/mapper/pgdatavg-pgwallv 100G 34G 67G 34% /pgwal Regards, James -Original Message- From: Justin Pryzby Sent: Sunday, May 29, 2022 11:02 PM To: James Pang (chaolpan) Cc: [email protected] Subject: Re: postgres backend process hang on " D " state On Sun, May 29, 2022 at 01:20:12PM +, James Pang (chaolpan) wrote: >We have a performance test on Postgresql 13.4 on RHEL8.4 , just after > connection storm in ( 952 new connections coming in 1 minute), a lot of > backends start on " D " state, and when more sessions got disconnected, they > do not exit successfully, instead became "defunct". No errors from > postgresql.log , just after the connection storm, some pg_cron workers can > not started either. The server is a Virtual machine and no IO hang > (actually) IO load is very low. Could be a postgresql bug or an OS abnormal > behavior? What settings have you set ? https://wiki.postgresql.org/wiki/Server_Configuration What extensions do you have loaded? \dx Send the output of SELECT * FROM pg_stat_activity either as an attachment or in \x mode? What is your data dir ? Is it on the VM's root filesystem or something else ? Show the output of "mount". Are there any kernel messages in /var/log/messages or `dmesg` ? How many relations are in your schema ? Are you using temp tables ? Long-running transactions ? -- Justin
Re: postgres backend process hang on " D " state
On Mon, May 30, 2022 at 01:19:56AM +, James Pang (chaolpan) wrote: > 1. extensions > shared_preload_libraries = > 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user' > 2. psql can not login now ,it hang there too, so can not check anything from > pg_stats_* views > 3. one main app user and 2 schemas ,no long running transactions . > 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no > use root filesystem. > /dev/mapper/pgdatavg-pgdatalv 500G 230G 271G 46% /pgdata > /dev/mapper/pgdatavg-pgarchivelv 190G 1.5G 189G 1% /pgarchive > /dev/mapper/pgdatavg-pgwallv 100G 34G 67G 34% /pgwal What are the LVM PVs ? Is it a scsi/virt device ? Or iscsi/drbd/??? I didn't hear back if there's any kernel errors. Is the storage broken/stuck/disconnected ? Can you run "time find /pgdata /pgarchive /pgwal -ls |wc" ? Could you run "ps -u postgres -O wchan=" Can you strace one of the stuck backends ? It sounds like you'll have to restart the service or VM (forcibly if necessary) to resolve the immediate issue and then collect the other info, and leave a "psql" open to try to (if the problem recurs) check pg_stat_activity and other DB info.
Re: postgres backend process hang on " D " state
"James Pang (chaolpan)" writes: > 1. extensions > shared_preload_libraries = > 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user' Can you still reproduce this if you remove all of those? regards, tom lane
RE: postgres backend process hang on " D " state
Update your questions . time find /pgdata /pgarchive /pgwal -ls |wc 82165 903817 8397391 real0m1.120s user0m0.432s sys 0m0.800s ps -u postgres -O wchan= PID S TTY TIME COMMAND 1951 -D ?00:26:37 /usr/pgsql-13/bin/postmaster -D /pgdata -c config_file=/pgdata/postgresql.conf 2341 -S ?00:00:06 postgres: logger 2361 -S ?00:01:02 postgres: checkpointer 2362 -S ?00:00:27 postgres: background writer 2363 -S ?00:00:59 postgres: walwriter 2364 -S ?00:02:00 postgres: autovacuum launcher 2365 -Z ?00:00:04 [postmaster] 2366 do_epoll_waitS ?00:13:30 postgres: stats collector 2367 do_epoll_waitS ?00:00:18 postgres: pg_cron launcher 2368 -S ?00:00:00 postgres: logical replication launcher 1053144 -Z ?00:05:36 [postmaster] 1053319 -Z ?00:05:29 [postmaster] 1053354 -Z ?00:05:53 [postmaster] 1053394 -Z ?00:05:51 [postmaster] ... 1064387 -Z ?00:05:13 [postmaster] 1070257 -D ?00:24:23 postgres: test pbwd 192.168.205.53(55886) BIND 1070258 -D ?00:24:24 postgres: test pbwd 192.168.205.50(58910) BIND 1070259 -D ?00:24:22 postgres: test pbwd 192.168.205.133(48754) SELECT 1070260 -Z ?00:05:02 [postmaster] ... Strace / gdb will hang there too for trace a process. Regards, James -Original Message- From: Justin Pryzby Sent: Monday, May 30, 2022 10:20 AM To: James Pang (chaolpan) Cc: [email protected] Subject: Re: postgres backend process hang on " D " state On Mon, May 30, 2022 at 01:19:56AM +, James Pang (chaolpan) wrote: > 1. extensions > shared_preload_libraries = > 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user' > 2. psql can not login now ,it hang there too, so can not check > anything from pg_stats_* views 3. one main app user and 2 schemas ,no long > running transactions . > 4. we use /pgdata , it's on xfs , lvm/vg RHEL8.4 ,it's a shared storage, no > use root filesystem. > /dev/mapper/pgdatavg-pgdatalv 500G 230G 271G 46% /pgdata > /dev/mapper/pgdatavg-pgarchivelv 190G 1.5G 189G 1% /pgarchive > /dev/mapper/pgdatavg-pgwallv 100G 34G 67G 34% /pgwal What are the LVM PVs ? Is it a scsi/virt device ? Or iscsi/drbd/??? I didn't hear back if there's any kernel errors. Is the storage broken/stuck/disconnected ? Can you run "time find /pgdata /pgarchive /pgwal -ls |wc" ? Could you run "ps -u postgres -O wchan=" Can you strace one of the stuck backends ? It sounds like you'll have to restart the service or VM (forcibly if necessary) to resolve the immediate issue and then collect the other info, and leave a "psql" open to try to (if the problem recurs) check pg_stat_activity and other DB info.
RE: postgres backend process hang on " D " state
Maybe any bugs from these extensions ? I can try that removing all extensions, but we need these extensions. Thanks, James -Original Message- From: Tom Lane Sent: Monday, May 30, 2022 10:21 AM To: James Pang (chaolpan) Cc: Justin Pryzby ; [email protected] Subject: Re: postgres backend process hang on " D " state "James Pang (chaolpan)" writes: > 1. extensions > shared_preload_libraries = > 'orafce,pgaudit,pg_cron,pg_stat_statements,set_user' Can you still reproduce this if you remove all of those? regards, tom lane
