Re: Postgresql server gets stuck at low load
I had log_min_duration_statement set to 0 for a short period, just before stuck and just after, so I have full list of SQL statements, next analyzed in pgBadger, there is no increase of amount of statements, and I can see, all statements are longer processed than before stuck. But following Your advice I'll check the results from pg_stat_statements. pt., 5 cze 2020 o 13:16 napisał(a): > > *De: *"Krzysztof Olszewski" > *Para: *[email protected] > *Enviadas: *Sexta-feira, 5 de junho de 2020 7:07:02 > *Assunto: *Postgresql server gets stuck at low load > > I have problem with one of my Postgres production server. Server works > fine almost always, but sometimes without any increase of transactions or > statements amount, machine gets stuck. Cores goes up to 100%, load up to > 160%. When it happens then there are problems with connect to database and > even it will succeed, simple queries works several seconds instead of > milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), > sometimes we must restart Postgres, Linux, or even KVM (which exists as > virtualization host). > My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10 > with about 40k IOPS > Os > CentOS Linux release 7.7.1908 > kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in > memory :) )server_version 10.12effective_cache_size 192000 > MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000 > MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores, > about 3% of load, whole database fits in memory, no reads from disk, only > writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but > on this hardware there is no problem with this values (no iowaits on > cores). In normal state this machine does "nothing". Connections to > database are created by two app servers based on Java, through connection > pools, so connections count is limited by configuration of pools and max is > 120, is lower value than in Postgres configuration (150). On normal state > there is about 20 connections, when stuck goes into max (120).In > correlation with stucks i see informations in kernel log aboutNMI watchdog: > BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know > this is reason or effect of problemI made investigation with pgBadger and > ... nothing strange happens, just normal statements Any ideas? Thanks, > Kris > > Hi Krzysztof! > > I would enable pg_stat_statements extension and check if there are long > running queries that should be quick. >
Re: Postgresql server gets stuck at low load
I had hugepage's off and on, problems still occurs, thanx for "perf top" suggestion, Retards Kris pt., 5 cze 2020 o 13:38 Pavel Stehule napisał(a): > > > pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski > napsal: > >> I have problem with one of my Postgres production server. Server works >> fine almost always, but sometimes without any increase of transactions or >> statements amount, machine gets stuck. Cores goes up to 100%, load up to >> 160%. When it happens then there are problems with connect to database and >> even it will succeed, simple queries works several seconds instead of >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), >> sometimes we must restart Postgres, Linux, or even KVM (which exists as >> virtualization host). >> >> My hardware >> 56 cores (Intel Core Processor (Skylake, IBRS)) >> 400 GB RAM >> RAID10 with about 40k IOPS >> >> Os >> CentOS Linux release 7.7.1908 >> kernel 3.10.0-1062.18.1.el7.x86_64 >> >> Databasesize 100 GB (entirely fit in memory :) ) >> server_version 10.12 >> effective_cache_size 192000 MB >> maintenance_work_mem 2048 MB >> max_connections 150 >> shared_buffers 64000 MB >> work_mem 96 MB >> >> On normal state, i have about 500 tps, 5% usage of cores, about 3% of >> load, whole database fits in memory, no reads from disk, only writes on >> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this >> hardware there is no problem with this values (no iowaits on cores). In >> normal state this machine does "nothing". Connections to database are >> created by two app servers based on Java, through connection pools, so >> connections count is limited by configuration of pools and max is 120, is >> lower value than in Postgres configuration (150). On normal state there is >> about 20 connections, when stuck goes into max (120). >> >> In correlation with stucks i see informations in kernel log about >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] >> but i don't know this is reason or effect of problem >> I made investigation with pgBadger and ... nothing strange happens, just >> normal statements >> >> Any ideas? >> > > you can try to install perf + debug symbols for postgres. When you will > have this problem again run "perf top". You can see what routines eat your > CPU. > > Maybe it can be a spinlock problem > > > https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com > > Can be interesting a reply on Merlin's question from mail/. > > cat /sys/kernel/mm/redhat_transparent_hugepage/enabled > cat /sys/kernel/mm/redhat_transparent_hugepage/defrag > > Regards > > Pavel > > >> >> Thanks, >> Kris >> >> >>
Re: Postgresql server gets stuck at low load
Hi, On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski wrote: > I have problem with one of my Postgres production server. Server works > fine almost always, but sometimes without any increase of transactions or > statements amount, machine gets stuck. Cores goes up to 100%, load up to > 160%. When it happens then there are problems with connect to database and > even it will succeed, simple queries works several seconds instead of > milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), > sometimes we must restart Postgres, Linux, or even KVM (which exists as > virtualization host). > > My hardware > 56 cores (Intel Core Processor (Skylake, IBRS)) > 400 GB RAM > RAID10 with about 40k IOPS > > Os > CentOS Linux release 7.7.1908 > kernel 3.10.0-1062.18.1.el7.x86_64 > > Databasesize 100 GB (entirely fit in memory :) ) > server_version 10.12 > effective_cache_size 192000 MB > maintenance_work_mem 2048 MB > max_connections 150 > shared_buffers 64000 MB > work_mem 96 MB > What is the value set to random_page_cost ? Set to 1 (same as default seq_page_cost) for a moment and try it. > > On normal state, i have about 500 tps, 5% usage of cores, about 3% of > load, whole database fits in memory, no reads from disk, only writes on > about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this > hardware there is no problem with this values (no iowaits on cores). In > normal state this machine does "nothing". Connections to database are > created by two app servers based on Java, through connection pools, so > connections count is limited by configuration of pools and max is 120, is > lower value than in Postgres configuration (150). On normal state there is > about 20 connections, when stuck goes into max (120). > > In correlation with stucks i see informations in kernel log about > NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] > but i don't know this is reason or effect of problem > I made investigation with pgBadger and ... nothing strange happens, just > normal statements > > Any ideas? > > Thanks, > Kris > > > -- Regards, Avinash Vallarapu
Re: Postgresql server gets stuck at low load
random_page_cost == 1.1 wt., 9 cze 2020 o 14:01 Avinash Kumar napisał(a): > Hi, > > On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski > wrote: > >> I have problem with one of my Postgres production server. Server works >> fine almost always, but sometimes without any increase of transactions or >> statements amount, machine gets stuck. Cores goes up to 100%, load up to >> 160%. When it happens then there are problems with connect to database and >> even it will succeed, simple queries works several seconds instead of >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), >> sometimes we must restart Postgres, Linux, or even KVM (which exists as >> virtualization host). >> >> My hardware >> 56 cores (Intel Core Processor (Skylake, IBRS)) >> 400 GB RAM >> RAID10 with about 40k IOPS >> >> Os >> CentOS Linux release 7.7.1908 >> kernel 3.10.0-1062.18.1.el7.x86_64 >> >> Databasesize 100 GB (entirely fit in memory :) ) >> server_version 10.12 >> effective_cache_size 192000 MB >> maintenance_work_mem 2048 MB >> max_connections 150 >> shared_buffers 64000 MB >> work_mem 96 MB >> > What is the value set to random_page_cost ? > Set to 1 (same as default seq_page_cost) for a moment and try it. > >> >> On normal state, i have about 500 tps, 5% usage of cores, about 3% of >> load, whole database fits in memory, no reads from disk, only writes on >> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this >> hardware there is no problem with this values (no iowaits on cores). In >> normal state this machine does "nothing". Connections to database are >> created by two app servers based on Java, through connection pools, so >> connections count is limited by configuration of pools and max is 120, is >> lower value than in Postgres configuration (150). On normal state there is >> about 20 connections, when stuck goes into max (120). >> >> In correlation with stucks i see informations in kernel log about >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] >> but i don't know this is reason or effect of problem >> I made investigation with pgBadger and ... nothing strange happens, just >> normal statements >> >> Any ideas? >> >> Thanks, >> Kris >> >> >> > > -- > Regards, > Avinash Vallarapu >
Re: Postgresql server gets stuck at low load
On Tue, Jun 09, 2020 at 01:54:21PM +0200, Krzysztof Olszewski wrote: > I had hugepage's off and on, problems still occurs, > thanx for "perf top" suggestion, > pt., 5 cze 2020 o 13:38 Pavel Stehule napisał(a): > > pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski > > napsal: > > > >> I have problem with one of my Postgres production server. Server works > >> fine almost always, but sometimes without any increase of transactions or > >> statements amount, machine gets stuck. Cores goes up to 100%, load up to > >> 160%. When it happens then there are problems with connect to database and > >> even it will succeed, simple queries works several seconds instead of > >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min), > >> sometimes we must restart Postgres, Linux, or even KVM (which exists as > >> virtualization host). > >> > >> My hardware > >> 56 cores (Intel Core Processor (Skylake, IBRS)) > >> 400 GB RAM > >> RAID10 with about 40k IOPS > >> > >> shared_buffers 64000 MB > >> > >> In correlation with stucks i see informations in kernel log about > >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935] > > > > https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com > > > > Can be interesting a reply on Merlin's question from mail/. > > > > cat /sys/kernel/mm/redhat_transparent_hugepage/enabled > > cat /sys/kernel/mm/redhat_transparent_hugepage/defrag try this: echo 2 |sudo /sys/kernel/mm/ksm/run https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com -- Justin
