Database Stalls

2023-01-30 Thread Mok
Hi,

We've started to observe instances of one of our databases stalling for a
few seconds.

We see a spike in wal write locks then nothing for a few seconds. After
which we have spike latency as processes waiting to get to the db can do
so.

There is nothing in the postgres logs that give us any clues to what could
be happening, no locks, unusually high/long running transactions, just a
pause and resume.

Could anyone give me any advice as to what to look for when it comes to
checking the underlying disk that the db is on?

Thanks,

Gurmokh


Re: Database Stalls

2023-01-30 Thread Justin Pryzby
On Mon, Jan 30, 2023 at 05:47:49PM +, Mok wrote:
> Hi,
> 
> We've started to observe instances of one of our databases stalling for a
> few seconds.
> 
> We see a spike in wal write locks then nothing for a few seconds. After
> which we have spike latency as processes waiting to get to the db can do
> so.
> 
> There is nothing in the postgres logs that give us any clues to what could
> be happening, no locks, unusually high/long running transactions, just a
> pause and resume.
> 
> Could anyone give me any advice as to what to look for when it comes to
> checking the underlying disk that the db is on?

What version postgres?  What settings have non-default values ? 


What OS/version?  What environment/hardware?  VM/image/provider/... 



Have you enabled logging for vacuum/checkpoints/locks ?

https://wiki.postgresql.org/wiki/Slow_Query_Questions




Re: Database Stalls

2023-01-30 Thread José Arthur Benetasso Villanova
On Mon, Jan 30, 2023 at 2:51 PM Justin Pryzby  wrote:

> On Mon, Jan 30, 2023 at 05:47:49PM +, Mok wrote:
> > Hi,
> >
> > We've started to observe instances of one of our databases stalling for a
> > few seconds.
> >
> > We see a spike in wal write locks then nothing for a few seconds. After
> > which we have spike latency as processes waiting to get to the db can do
> > so.
> >
> > There is nothing in the postgres logs that give us any clues to what
> could
> > be happening, no locks, unusually high/long running transactions, just a
> > pause and resume.
> >
> > Could anyone give me any advice as to what to look for when it comes to
> > checking the underlying disk that the db is on?
>
> What version postgres?  What settings have non-default values ?
>
>
> What OS/version?  What environment/hardware?  VM/image/provider/...
>
>
>
> Have you enabled logging for vacuum/checkpoints/locks ?
>
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
>
>
> In addition to previous questions, if possible, a SELECT * FROM
pg_stat_activity at the moment of the stall. The most important information
is the wait_event column. My guess is the disk, but just the select at the
right moment can answer this.

-- 
José Arthur Benetasso Villanova


Re: Database Stalls

2023-01-30 Thread Shiv Iyer
Hi Burmokh,

Please take a look at this article copied below and ping me for further 
guidance. Thanks! 


How expensive SQLs can impact PostgreSQL Performance? - 
https://minervadb.xyz/how-expensive-sqls-can-impact-postgresql-performance/ 


—
Best
Shiv 



> On 30-Jan-2023, at 11:17 PM, Mok  wrote:
> 
> Hi, 
> 
> We've started to observe instances of one of our databases stalling for a few 
> seconds. 
> 
> We see a spike in wal write locks then nothing for a few seconds. After which 
> we have spike latency as processes waiting to get to the db can do so. 
> 
> There is nothing in the postgres logs that give us any clues to what could be 
> happening, no locks, unusually high/long running transactions, just a pause 
> and resume. 
> 
> Could anyone give me any advice as to what to look for when it comes to 
> checking the underlying disk that the db is on? 
> 
> Thanks, 
> 
> Gurmokh 
> 
> 



Re: Database Stalls

2023-01-30 Thread Mok
Hi,

Unfortunately there is no pg_stat_activity data available as we are unaware
of the issue until it has already happened.

The version we are on is 12.11.

I don't think it is due to locks as there are none in the logs. Vacuums are
logged also and none occur before or after this event. Checkpoint timeout
is set to 1 hour and these events do not coincide with checkpoints.

Gurmokh

On Mon, 30 Jan 2023 at 18:47, Shiv Iyer  wrote:

> Hi Burmokh,
>
> Please take a look at this article copied below and ping me for further
> guidance. Thanks!
>
>
> How expensive SQLs can impact PostgreSQL Performance? -
> https://minervadb.xyz/how-expensive-sqls-can-impact-postgresql-performance/
>
>
> —
> Best
> Shiv
>
>
>
> On 30-Jan-2023, at 11:17 PM, Mok  wrote:
>
> Hi,
>
> We've started to observe instances of one of our databases stalling for a
> few seconds.
>
> We see a spike in wal write locks then nothing for a few seconds. After
> which we have spike latency as processes waiting to get to the db can do
> so.
>
> There is nothing in the postgres logs that give us any clues to what could
> be happening, no locks, unusually high/long running transactions, just a
> pause and resume.
>
> Could anyone give me any advice as to what to look for when it comes to
> checking the underlying disk that the db is on?
>
> Thanks,
>
> Gurmokh
>
>
>
>


Fwd: Database Stalls

2023-01-30 Thread Rick Otten
On Mon, Jan 30, 2023 at 4:32 PM Mok  wrote:

> Hi,
>
> Unfortunately there is no pg_stat_activity data available as we are
> unaware of the issue until it has already happened.
>
> The version we are on is 12.11.
>
> I don't think it is due to locks as there are none in the logs. Vacuums
> are logged also and none occur before or after this event. Checkpoint
> timeout is set to 1 hour and these events do not coincide with checkpoints.
>
> Gurmokh
>
>>
>>
Have you eliminated network issues?  I have seen what looks like a database
stalling to end up actually being the network packets taking a side trip to
halfway around the world for a while.  Or DNS lookups suddenly taking a
really long time.

The next most likely thing is disk i/o.  Do you have huge corresponding
disk i/o spikes or does it drop completely to zero (which is also bad -
especially if you are on a SAN and you can't get any packets out on that
network).  You'll have to look at your disks via OS tools to see.

Do you have any hardware faults?  Errors on a hardware bus?  Overheating?
I used to have a system that would freeze up entirely due to a problem with
a serial port that we had a console attached to - it was sending a low
level interrupt.  Sometimes it would recover mysteriously if someone hit
the carriage return a couple times.  Ie, is it _really_ the database that
is locking up, or is it your hardware?