Re: Postgresql server gets stuck at low load

2020-06-09 Thread Krzysztof Olszewski
I had log_min_duration_statement set to 0 for a short period, just before
stuck and just after, so I have full list of SQL statements, next analyzed
in pgBadger, there is no increase of amount of statements, and I can see,
all statements are longer processed than before stuck. But following Your
advice I'll check the results from pg_stat_statements.

pt., 5 cze 2020 o 13:16  napisał(a):

>
> *De: *"Krzysztof Olszewski" 
> *Para: *[email protected]
> *Enviadas: *Sexta-feira, 5 de junho de 2020 7:07:02
> *Assunto: *Postgresql server gets stuck at low load
>
> I have problem with one of my Postgres production server. Server works
> fine almost always, but sometimes without any increase of transactions or
> statements amount, machine gets stuck. Cores goes up to 100%, load up to
> 160%. When it happens then there are problems with connect to database and
> even it will succeed, simple queries works several seconds instead of
> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
> sometimes we must restart Postgres, Linux, or even KVM (which exists as
> virtualization host).
> My hardware56 cores (Intel Core Processor (Skylake, IBRS))400 GB RAMRAID10
> with about 40k IOPS
> Os
> CentOS Linux release 7.7.1908
> kernel 3.10.0-1062.18.1.el7.x86_64 Databasesize 100 GB (entirely fit in
> memory :) )server_version 10.12effective_cache_size 192000
> MBmaintenance_work_mem 2048 MBmax_connections 150 shared_buffers 64000
> MBwork_mem 96 MBOn normal state, i have about 500 tps, 5% usage of cores,
> about 3% of load, whole database fits in memory, no reads from disk, only
> writes on about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but
> on this hardware there is no problem with this values (no iowaits on
> cores). In normal state this machine does "nothing". Connections to
> database are created by two app servers based on Java, through connection
> pools, so connections count is limited by configuration of pools and max is
> 120, is lower value than in Postgres configuration (150). On normal state
> there is about 20 connections, when stuck goes into max (120).In
> correlation with stucks i see informations in kernel log aboutNMI watchdog:
> BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]but i don't know
> this is reason or effect of problemI made investigation with pgBadger and
> ... nothing strange happens, just normal statements Any ideas? Thanks,
> Kris
>
> Hi Krzysztof!
>
> I would enable pg_stat_statements extension and check if there are long
> running queries that should be quick.
>


Re: Postgresql server gets stuck at low load

2020-06-09 Thread Krzysztof Olszewski
 I had hugepage's off and on, problems still occurs,
thanx for "perf top" suggestion,

Retards
Kris



pt., 5 cze 2020 o 13:38 Pavel Stehule  napisał(a):

>
>
> pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski 
> napsal:
>
>> I have problem with one of my Postgres production server. Server works
>> fine almost always, but sometimes without any increase of transactions or
>> statements amount, machine gets stuck. Cores goes up to 100%, load up to
>> 160%. When it happens then there are problems with connect to database and
>> even it will succeed, simple queries works several seconds instead of
>> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
>> sometimes we must restart Postgres, Linux, or even KVM (which exists as
>> virtualization host).
>>
>> My hardware
>> 56 cores (Intel Core Processor (Skylake, IBRS))
>> 400 GB RAM
>> RAID10 with about 40k IOPS
>>
>> Os
>> CentOS Linux release 7.7.1908
>> kernel 3.10.0-1062.18.1.el7.x86_64
>>
>> Databasesize 100 GB (entirely fit in memory :) )
>> server_version 10.12
>> effective_cache_size 192000 MB
>> maintenance_work_mem 2048 MB
>> max_connections 150
>> shared_buffers 64000 MB
>> work_mem 96 MB
>>
>> On normal state, i have about 500 tps, 5% usage of cores, about 3% of
>> load, whole database fits in memory, no reads from disk, only writes on
>> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this
>> hardware there is no problem with this values (no iowaits on cores). In
>> normal state this machine does "nothing". Connections to database are
>> created by two app servers based on Java, through connection pools, so
>> connections count is limited by configuration of pools and max is 120, is
>> lower value than in Postgres configuration (150). On normal state there is
>> about 20 connections, when stuck goes into max (120).
>>
>> In correlation with stucks i see informations in kernel log about
>> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
>> but i don't know this is reason or effect of problem
>> I made investigation with pgBadger and ... nothing strange happens, just
>> normal statements
>>
>> Any ideas?
>>
>
> you can try to install perf + debug symbols for postgres. When you will
> have this problem again run "perf top". You can see what routines eat your
> CPU.
>
> Maybe it can be a spinlock problem
>
>
> https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com
>
> Can be interesting a reply on Merlin's question from mail/.
>
> cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
> cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
>
> Regards
>
> Pavel
>
>
>>
>> Thanks,
>> Kris
>>
>>
>>


Re: Postgresql server gets stuck at low load

2020-06-09 Thread Avinash Kumar
Hi,

On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski 
wrote:

> I have problem with one of my Postgres production server. Server works
> fine almost always, but sometimes without any increase of transactions or
> statements amount, machine gets stuck. Cores goes up to 100%, load up to
> 160%. When it happens then there are problems with connect to database and
> even it will succeed, simple queries works several seconds instead of
> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
> sometimes we must restart Postgres, Linux, or even KVM (which exists as
> virtualization host).
>
> My hardware
> 56 cores (Intel Core Processor (Skylake, IBRS))
> 400 GB RAM
> RAID10 with about 40k IOPS
>
> Os
> CentOS Linux release 7.7.1908
> kernel 3.10.0-1062.18.1.el7.x86_64
>
> Databasesize 100 GB (entirely fit in memory :) )
> server_version 10.12
> effective_cache_size 192000 MB
> maintenance_work_mem 2048 MB
> max_connections 150
> shared_buffers 64000 MB
> work_mem 96 MB
>
What is the value set to random_page_cost ?
Set to 1 (same as default seq_page_cost) for a moment and try it.

>
> On normal state, i have about 500 tps, 5% usage of cores, about 3% of
> load, whole database fits in memory, no reads from disk, only writes on
> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this
> hardware there is no problem with this values (no iowaits on cores). In
> normal state this machine does "nothing". Connections to database are
> created by two app servers based on Java, through connection pools, so
> connections count is limited by configuration of pools and max is 120, is
> lower value than in Postgres configuration (150). On normal state there is
> about 20 connections, when stuck goes into max (120).
>
> In correlation with stucks i see informations in kernel log about
> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
> but i don't know this is reason or effect of problem
> I made investigation with pgBadger and ... nothing strange happens, just
> normal statements
>
> Any ideas?
>
> Thanks,
> Kris
>
>
>

-- 
Regards,
Avinash Vallarapu


Re: Postgresql server gets stuck at low load

2020-06-09 Thread Krzysztof Olszewski
 random_page_cost  == 1.1

wt., 9 cze 2020 o 14:01 Avinash Kumar 
napisał(a):

> Hi,
>
> On Fri, Jun 5, 2020 at 7:07 AM Krzysztof Olszewski 
> wrote:
>
>> I have problem with one of my Postgres production server. Server works
>> fine almost always, but sometimes without any increase of transactions or
>> statements amount, machine gets stuck. Cores goes up to 100%, load up to
>> 160%. When it happens then there are problems with connect to database and
>> even it will succeed, simple queries works several seconds instead of
>> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
>> sometimes we must restart Postgres, Linux, or even KVM (which exists as
>> virtualization host).
>>
>> My hardware
>> 56 cores (Intel Core Processor (Skylake, IBRS))
>> 400 GB RAM
>> RAID10 with about 40k IOPS
>>
>> Os
>> CentOS Linux release 7.7.1908
>> kernel 3.10.0-1062.18.1.el7.x86_64
>>
>> Databasesize 100 GB (entirely fit in memory :) )
>> server_version 10.12
>> effective_cache_size 192000 MB
>> maintenance_work_mem 2048 MB
>> max_connections 150
>> shared_buffers 64000 MB
>> work_mem 96 MB
>>
> What is the value set to random_page_cost ?
> Set to 1 (same as default seq_page_cost) for a moment and try it.
>
>>
>> On normal state, i have about 500 tps, 5% usage of cores, about 3% of
>> load, whole database fits in memory, no reads from disk, only writes on
>> about 500 IOPS level, sometimes in spikes on 1500 IOPS level, but on this
>> hardware there is no problem with this values (no iowaits on cores). In
>> normal state this machine does "nothing". Connections to database are
>> created by two app servers based on Java, through connection pools, so
>> connections count is limited by configuration of pools and max is 120, is
>> lower value than in Postgres configuration (150). On normal state there is
>> about 20 connections, when stuck goes into max (120).
>>
>> In correlation with stucks i see informations in kernel log about
>> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
>> but i don't know this is reason or effect of problem
>> I made investigation with pgBadger and ... nothing strange happens, just
>> normal statements
>>
>> Any ideas?
>>
>> Thanks,
>> Kris
>>
>>
>>
>
> --
> Regards,
> Avinash Vallarapu
>


Re: Postgresql server gets stuck at low load

2020-06-09 Thread Justin Pryzby
On Tue, Jun 09, 2020 at 01:54:21PM +0200, Krzysztof Olszewski wrote:
>  I had hugepage's off and on, problems still occurs,
> thanx for "perf top" suggestion,

> pt., 5 cze 2020 o 13:38 Pavel Stehule  napisał(a):
> > pá 5. 6. 2020 v 12:07 odesílatel Krzysztof Olszewski  
> > napsal:
> >
> >> I have problem with one of my Postgres production server. Server works
> >> fine almost always, but sometimes without any increase of transactions or
> >> statements amount, machine gets stuck. Cores goes up to 100%, load up to
> >> 160%. When it happens then there are problems with connect to database and
> >> even it will succeed, simple queries works several seconds instead of
> >> milliseconds.Problem sometimes stops after a period a time (e.g. 35 min),
> >> sometimes we must restart Postgres, Linux, or even KVM (which exists as
> >> virtualization host).
> >>
> >> My hardware
> >> 56 cores (Intel Core Processor (Skylake, IBRS))
> >> 400 GB RAM
> >> RAID10 with about 40k IOPS
> >>
> >> shared_buffers 64000 MB
> >>
> >> In correlation with stucks i see informations in kernel log about
> >> NMI watchdog: BUG: soft lockup - CPU#25 stuck for 23s! [postmaster:33935]
> >
> > https://www.postgresql.org/message-id/CAHyXU0yAsVxoab2PcyoCuPjqymtnaE93v7bN4ctv2aNi92fefA%40mail.gmail.com
> >
> > Can be interesting a reply on Merlin's question from mail/.
> >
> > cat /sys/kernel/mm/redhat_transparent_hugepage/enabled
> > cat /sys/kernel/mm/redhat_transparent_hugepage/defrag

try this:
echo 2 |sudo /sys/kernel/mm/ksm/run

https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

-- 
Justin