Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2023-01-05 Thread Maxim Boguk
On Sat, Dec 31, 2022 at 2:26 PM Maxim Boguk  wrote:

> Hi,
>
> When performing post-mortem analysis of some short latency spikes on a
> heavily loaded database, I found that the reason for (less than 10 second
> latency spike) wasn't on the EXECUTE stage but on the BIND stage.
> At the same time graphical monitoring shows that during this few second
> period there were some queries waiting in the BIND stage.
>
> Logging setup:
> log_min_duration_statement=200ms
> log_lock_waits=on
> deadlock_timeout=100ms
> So I expected that every lock waiting over 100ms (>deadlock_timeout)
> should be in the log.
> But in the log I see only spikes on slow BIND but not lock waits logged.
> (
> grep BIND /var/log/postgresql/postgresql-2022-12-29.log | grep 'duration'
> | perl -pe 's/^(2022-12-29 \d\d:\d\d:\d).*$/$1/' | sort | uniq -c | less
> ...
>   9 2022-12-29 00:12:5
>   2 2022-12-29 00:13:1
>   3 2022-12-29 00:13:5
> !!!  68 2022-12-29 00:14:0
>   5 2022-12-29 00:14:1
>   3 2022-12-29 00:14:2
>   2 2022-12-29 00:14:3
> ).
> But no lock waits on the BIND stage logged during the problem period (and
> no lock waits in general).
> Similar issues happen a few times per day without any visible pattern (but
> on the same tables usually).
> No CPU or IO load/latency spikes found during problem periods.
> No EXECUTE slowdown found in the log during that time.
>


Followup research of this issue lead me to following results:
Every logged spike of BIND/PARSE response time correlated with
corresponding backend waiting on
wait_event_type = LWLock
wait_event = pg_stat_statements
and all of these spikes happen during increment of
pg_stat_statements_info.dealloc counter.

Some searching about this issue lead me to following blog post about
similar issue:
https://yhuelf.github.io/2021/09/30/pg_stat_statements_bottleneck.html

However, we already have pg_stat_statements.max=1 so further increase
of this parameter
seems counterproductive (the size of
14/main/pg_stat_tmp/pgss_query_texts.stat is already over 20MB).


Open questions remains:
1)Is it expected behaviour of pg_stat_statements to block every BIND/PARSE
during deallocation of least used entries for the whole period of cleanup?


2)Any recommended workaround for this issue for systems with strict latency
SLA
(block every database query (used extended query protocol) for 200-500ms
50+ times per day at random time - isn't acceptable for our case
unfortunately)?


3)Why only BIND/PARSE locks but not EXECUTE?
(may be some difference in implementation of plan vs exec
pg_stat_statements counters?).


Kind Regards,
Maxim


-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678


Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2023-01-05 Thread MichaelDBA


What happens if you take pg_stat_statements out of the picture (remove 
from shared_preload_libraries)?  Does your BIND problem go away?


Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2023-01-05 Thread MichaelDBA
Well if you find out for sure, please let me know.  I'm very interested 
in the outcome of this problem.


Maxim Boguk wrote on 1/5/2023 6:44 AM:



On Thu, Jan 5, 2023 at 1:31 PM MichaelDBA > wrote:



What happens if you takepg_stat_statements out of the picture
(remove from shared_preload_libraries)?  Does your BIND problem go
away?


I didn't test this idea, because it requires restart of the database 
(it cannot be done quickly) and without pg_stat_statementsthere will 
be no adequate performance monitoring of the database.
But I'm pretty sure that the issue will go away with 
pg_stat_statements disabled.


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678







Re: How to analyze of short but heavy intermittent slowdown on BIND on production database (or BIND vs log_lock_waits)

2023-01-05 Thread Maxim Boguk
On Thu, Jan 5, 2023 at 1:31 PM MichaelDBA  wrote:

>
> What happens if you take pg_stat_statements out of the picture (remove
> from shared_preload_libraries)?  Does your BIND problem go away?
>

I didn't test this idea, because it requires restart of the database (it
cannot be done quickly) and without pg_stat_statements there will be no
adequate performance monitoring of the database.
But I'm pretty sure that the issue will go away with pg_stat_statements
disabled.

-- 
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 
Phone AU: +61  45 218 5678