LWLocks by LockManager slowing large DB

2021-04-12 Thread Paul Friedman
Hello, apologies for the long post, but I want to make sure I’ve got enough
details to describe the problem for y’all.



I’ve got a 64-core (Ubuntu 18.04 – 240GB RAM running at GCP) instance
running PG 13.2 and PostGIS 3.1.1 and we’re having troubles getting it to
run more than 30 or so large queries at the same time accessing the same
tables.  With 60 threads, each thread is only running at ~30% CPU and no
diskIO/IOWait (once the tables become cached).



Boiling the complex queries down to their simplest form, we test running 60
of this query simultaneously:



select

  count(*)

from

  travel_processing_v5.llc_zone z,

  parent_set10.usca_trip_points7 t

where t.year_num = 2019 and t.month_num = 9

and st_intersects(t.lock_geom, z.s_geom)

and st_intersects(t.lock_geom, z.e_geom);



llc_zone = 981 rows (568k disk size) with s_geom and e_geom both of
datatype geometry(Multipolygon, 2163)

usca_trip_points7 = 79 million rows (469G disk size) with t.lock_geom
datatype geometry(Linestring, 2163)

(more detailed schema/stats can be provided if helpful)



postgresql.conf is pretty normal for a large system like this (with
appropriate shared_buffer, work_mem, etc. – can be provided if helpful, too)



What I’m finding in pg_stat_activity when running this is lots of
wait_events of type ‘LockManager’.

Rebuilding with CFLAGS=" -fno-omit-frame-pointer"
--prefix=/usr/local/pgsql_13debug --enable-dtrace CPPFLAGS='-DLOCK_DEBUG'
and then setting trace_lwlocks yields lots of records looking like:



[39691] LOG:  39691: LWLockAcquire(LockManager 0x7fab2cc09d80): excl 0
shared 0 haswaiters 1 waiters 6 rOK 1



Does anyone have any advice on how to alleviate LockManager’s LWlock issue?



Thanks for any assistance!



---Paul



Paul Friedman

CTO



677 Harrison St  |  San Francisco, CA 94107

*M:* (650) 270-7676

*E-mail:* [email protected]


Re: LWLocks by LockManager slowing large DB

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 12:37:42 -0700, Paul Friedman wrote:
> Boiling the complex queries down to their simplest form, we test running 60
> of this query simultaneously:

How long does one execution of these queries take (on average)? The
likely bottlenecks are very different between running 60 concurrent
queries that each complete in 0.1ms and ones that take > 1s.


Could you show the results for a query like
SELECT state, backend_type, wait_event_type, wait_event, count(*) FROM 
pg_stat_activity GROUP BY 1, 2, 3, 4 ORDER BY count(*) DESC;
?

Without knowing the proportion of LockManager wait events compared to
the rest it's hard to know what to make of it.


> Does anyone have any advice on how to alleviate LockManager’s LWlock issue?

It'd be useful to get a perf profile for this. Both for cpu time and for
what ends up blocking on kernel-level locks. E.g. something like

# cpu time
perf record --call-graph dwarf -F 500 -a sleep 5
perf report --no-children --sort comm,symbol

To send it to the list you can use something like
perf report --no-children --sort comm,symbol|head -n 500 > somefile

# kernel level blocking on locks
perf record --call-graph dwarf -e syscalls:sys_enter_futex -a sleep 3
perf report --no-children --sort comm,symbol

Greetings,

Andres Freund




RE: LWLocks by LockManager slowing large DB

2021-04-12 Thread Paul Friedman
Thanks for the quick reply!



These queries take ~1hr and are the only thing running on the system (all
60 are launched at the same time and the tables/files are fully-primed into
memory so iowaits are basically zero).



Yes, that’s the same query I’ve been running to analyze the locks and this
is the problem:



SELECT state, backend_type, wait_event_type, wait_event, count(*) FROM
pg_stat_activity GROUP BY 1, 2, 3, 4 ORDER BY count(*) DESC;



State  backend_type wait_event_type wait_event  count

active client backend   LWLock  LockManager 28

active client backend   21

  autovacuum launcher   ActivityAutoVacuumMain  1

  logical replication launcher Activity
LogicalLauncherMain   1

  checkpointer ActivityCheckpointerMain 1

idle  client backend   Client  ClientRead 1

  background writer ActivityBgWriterMain 1

  walwriter ActivityWalWriterMain 1



Thanks again for any advice you have.



---Paul



Paul Friedman

CTO





677 Harrison St  |  San Francisco, CA 94107

M: (650) 270-7676

E-mail: [email protected]



-Original Message-
From: Andres Freund 
Sent: Monday, April 12, 2021 2:58 PM
To: Paul Friedman 
Cc: [email protected]
Subject: Re: LWLocks by LockManager slowing large DB



Hi,



On 2021-04-12 12:37:42 -0700, Paul Friedman wrote:

> Boiling the complex queries down to their simplest form, we test

> running 60 of this query simultaneously:



How long does one execution of these queries take (on average)? The likely
bottlenecks are very different between running 60 concurrent queries that
each complete in 0.1ms and ones that take > 1s.





Could you show the results for a query like SELECT state, backend_type,
wait_event_type, wait_event, count(*) FROM pg_stat_activity GROUP BY 1, 2,
3, 4 ORDER BY count(*) DESC; ?



Without knowing the proportion of LockManager wait events compared to the
rest it's hard to know what to make of it.





> Does anyone have any advice on how to alleviate LockManager’s LWlock
issue?



It'd be useful to get a perf profile for this. Both for cpu time and for
what ends up blocking on kernel-level locks. E.g. something like



# cpu time

perf record --call-graph dwarf -F 500 -a sleep 5 perf report --no-children
--sort comm,symbol



To send it to the list you can use something like perf report --no-children
--sort comm,symbol|head -n 500 > somefile



# kernel level blocking on locks

perf record --call-graph dwarf -e syscalls:sys_enter_futex -a sleep 3 perf
report --no-children --sort comm,symbol



Greetings,



Andres Freund


Re: LWLocks by LockManager slowing large DB

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 15:15:05 -0700, Paul Friedman wrote:
> Thanks again for any advice you have.

I think we'd need the perf profiles to be able to dig into this
further.

It's odd that there are a meaningful amount of LockManager contention in
your case - assuming the stats you collected weren't just the first few
milliseconds of starting those 60 queries, there shouldn't be any
additional "heavyweight locks" taken given the duration of your queries.

The futex profile hopefully will tell us from where that is coming
from...

Greetings,

Andres Freund




RE: LWLocks by LockManager slowing large DB

2021-04-12 Thread Paul Friedman
Yes, I ran the query after a couple of minutes.  Those are the
steady-state numbers.

Also 'top' shows:

top - 22:44:26 up 12 days, 23:14,  5 users,  load average: 20.99, 21.35,
19.27
Tasks: 859 total,  26 running, 539 sleeping,   0 stopped,   0 zombie
%Cpu(s): 34.3 us,  1.6 sy,  0.0 ni, 64.1 id,  0.0 wa,  0.0 hi,  0.0 si,
0.0 st
KiB Mem : 24742353+total, 33723356 free, 73160656 used,
14053952+buff/cache
KiB Swap:0 total,0 free,0 used. 17132937+avail Mem


  PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
COMMAND

30070 postgres  20   0 41.232g  28608  18192 R  53.8  0.0  17:35.74
postgres

30087 postgres  20   0 41.233g  28408  18180 S  53.8  0.0  17:35.69
postgres

30055 postgres  20   0 41.233g  28492  18120 R  53.5  0.0  17:41.51
postgres

Note the postgres processes only running at 53% with the system at 64%
idle.  The 1.7% system time seems indicative of the spinlocks blocking the
major processing.

Do you know what resource the LockManager might be blocking on/protecting
with these LWlocks?

Also, I didn't understand your comment about a 'futex profile', could you
point me in the right direction here?

Thanks.

---Paul

Paul Friedman
CTO


677 Harrison St  |  San Francisco, CA 94107
M: (650) 270-7676
E-mail: [email protected]

-Original Message-
From: Andres Freund 
Sent: Monday, April 12, 2021 3:22 PM
To: Paul Friedman 
Cc: [email protected]
Subject: Re: LWLocks by LockManager slowing large DB

Hi,

On 2021-04-12 15:15:05 -0700, Paul Friedman wrote:
> Thanks again for any advice you have.

I think we'd need the perf profiles to be able to dig into this further.

It's odd that there are a meaningful amount of LockManager contention in
your case - assuming the stats you collected weren't just the first few
milliseconds of starting those 60 queries, there shouldn't be any
additional "heavyweight locks" taken given the duration of your queries.

The futex profile hopefully will tell us from where that is coming from...

Greetings,

Andres Freund




Re: LWLocks by LockManager slowing large DB

2021-04-12 Thread Andres Freund
Hi,

On 2021-04-12 15:56:08 -0700, Paul Friedman wrote:
> Also, I didn't understand your comment about a 'futex profile', could you
> point me in the right direction here?

My earlier mail included a section about running a perf profile showing
the callers of the futex() system call, which in turn is what lwlocks
end up using on linux when the lock is contended.

Check the second half of:
https://www.postgresql.org/message-id/20210412215738.xytq33wlciljyva5%40alap3.anarazel.de

Greetings,

Andres Freund




Re: LWLocks by LockManager slowing large DB

2021-04-12 Thread Nikolay Samokhvalov
On Mon, Apr 12, 2021 at 14:57 Andres Freund  wrote:

> Without knowing the proportion of LockManager wait events compared to
> the rest it's hard to know what to make of it.


These OSS tools can be useful to understand the proportion:

- pgCenter
https://github.com/lesovsky/pgcenter
- pg_wait_sampling (can be used together with POWA monitoring)
https://github.com/postgrespro/pg_wait_sampling
- pgsentinel
https://github.com/pgsentinel/pgsentinel
- PASH Viewer (good for visualization, integrates with pgsentinel)
https://github.com/dbacvetkov/PASH-Viewer

>