LWLocks by LockManager slowing large DB
Hello, apologies for the long post, but I want to make sure I’ve got enough details to describe the problem for y’all. I’ve got a 64-core (Ubuntu 18.04 – 240GB RAM running at GCP) instance running PG 13.2 and PostGIS 3.1.1 and we’re having troubles getting it to run more than 30 or so large queries at the same time accessing the same tables. With 60 threads, each thread is only running at ~30% CPU and no diskIO/IOWait (once the tables become cached). Boiling the complex queries down to their simplest form, we test running 60 of this query simultaneously: select count(*) from travel_processing_v5.llc_zone z, parent_set10.usca_trip_points7 t where t.year_num = 2019 and t.month_num = 9 and st_intersects(t.lock_geom, z.s_geom) and st_intersects(t.lock_geom, z.e_geom); llc_zone = 981 rows (568k disk size) with s_geom and e_geom both of datatype geometry(Multipolygon, 2163) usca_trip_points7 = 79 million rows (469G disk size) with t.lock_geom datatype geometry(Linestring, 2163) (more detailed schema/stats can be provided if helpful) postgresql.conf is pretty normal for a large system like this (with appropriate shared_buffer, work_mem, etc. – can be provided if helpful, too) What I’m finding in pg_stat_activity when running this is lots of wait_events of type ‘LockManager’. Rebuilding with CFLAGS=" -fno-omit-frame-pointer" --prefix=/usr/local/pgsql_13debug --enable-dtrace CPPFLAGS='-DLOCK_DEBUG' and then setting trace_lwlocks yields lots of records looking like: [39691] LOG: 39691: LWLockAcquire(LockManager 0x7fab2cc09d80): excl 0 shared 0 haswaiters 1 waiters 6 rOK 1 Does anyone have any advice on how to alleviate LockManager’s LWlock issue? Thanks for any assistance! ---Paul Paul Friedman CTO 677 Harrison St | San Francisco, CA 94107 *M:* (650) 270-7676 *E-mail:* [email protected]
Re: LWLocks by LockManager slowing large DB
Hi, On 2021-04-12 12:37:42 -0700, Paul Friedman wrote: > Boiling the complex queries down to their simplest form, we test running 60 > of this query simultaneously: How long does one execution of these queries take (on average)? The likely bottlenecks are very different between running 60 concurrent queries that each complete in 0.1ms and ones that take > 1s. Could you show the results for a query like SELECT state, backend_type, wait_event_type, wait_event, count(*) FROM pg_stat_activity GROUP BY 1, 2, 3, 4 ORDER BY count(*) DESC; ? Without knowing the proportion of LockManager wait events compared to the rest it's hard to know what to make of it. > Does anyone have any advice on how to alleviate LockManager’s LWlock issue? It'd be useful to get a perf profile for this. Both for cpu time and for what ends up blocking on kernel-level locks. E.g. something like # cpu time perf record --call-graph dwarf -F 500 -a sleep 5 perf report --no-children --sort comm,symbol To send it to the list you can use something like perf report --no-children --sort comm,symbol|head -n 500 > somefile # kernel level blocking on locks perf record --call-graph dwarf -e syscalls:sys_enter_futex -a sleep 3 perf report --no-children --sort comm,symbol Greetings, Andres Freund
RE: LWLocks by LockManager slowing large DB
Thanks for the quick reply! These queries take ~1hr and are the only thing running on the system (all 60 are launched at the same time and the tables/files are fully-primed into memory so iowaits are basically zero). Yes, that’s the same query I’ve been running to analyze the locks and this is the problem: SELECT state, backend_type, wait_event_type, wait_event, count(*) FROM pg_stat_activity GROUP BY 1, 2, 3, 4 ORDER BY count(*) DESC; State backend_type wait_event_type wait_event count active client backend LWLock LockManager 28 active client backend 21 autovacuum launcher ActivityAutoVacuumMain 1 logical replication launcher Activity LogicalLauncherMain 1 checkpointer ActivityCheckpointerMain 1 idle client backend Client ClientRead 1 background writer ActivityBgWriterMain 1 walwriter ActivityWalWriterMain 1 Thanks again for any advice you have. ---Paul Paul Friedman CTO 677 Harrison St | San Francisco, CA 94107 M: (650) 270-7676 E-mail: [email protected] -Original Message- From: Andres Freund Sent: Monday, April 12, 2021 2:58 PM To: Paul Friedman Cc: [email protected] Subject: Re: LWLocks by LockManager slowing large DB Hi, On 2021-04-12 12:37:42 -0700, Paul Friedman wrote: > Boiling the complex queries down to their simplest form, we test > running 60 of this query simultaneously: How long does one execution of these queries take (on average)? The likely bottlenecks are very different between running 60 concurrent queries that each complete in 0.1ms and ones that take > 1s. Could you show the results for a query like SELECT state, backend_type, wait_event_type, wait_event, count(*) FROM pg_stat_activity GROUP BY 1, 2, 3, 4 ORDER BY count(*) DESC; ? Without knowing the proportion of LockManager wait events compared to the rest it's hard to know what to make of it. > Does anyone have any advice on how to alleviate LockManager’s LWlock issue? It'd be useful to get a perf profile for this. Both for cpu time and for what ends up blocking on kernel-level locks. E.g. something like # cpu time perf record --call-graph dwarf -F 500 -a sleep 5 perf report --no-children --sort comm,symbol To send it to the list you can use something like perf report --no-children --sort comm,symbol|head -n 500 > somefile # kernel level blocking on locks perf record --call-graph dwarf -e syscalls:sys_enter_futex -a sleep 3 perf report --no-children --sort comm,symbol Greetings, Andres Freund
Re: LWLocks by LockManager slowing large DB
Hi, On 2021-04-12 15:15:05 -0700, Paul Friedman wrote: > Thanks again for any advice you have. I think we'd need the perf profiles to be able to dig into this further. It's odd that there are a meaningful amount of LockManager contention in your case - assuming the stats you collected weren't just the first few milliseconds of starting those 60 queries, there shouldn't be any additional "heavyweight locks" taken given the duration of your queries. The futex profile hopefully will tell us from where that is coming from... Greetings, Andres Freund
RE: LWLocks by LockManager slowing large DB
Yes, I ran the query after a couple of minutes. Those are the steady-state numbers. Also 'top' shows: top - 22:44:26 up 12 days, 23:14, 5 users, load average: 20.99, 21.35, 19.27 Tasks: 859 total, 26 running, 539 sleeping, 0 stopped, 0 zombie %Cpu(s): 34.3 us, 1.6 sy, 0.0 ni, 64.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 24742353+total, 33723356 free, 73160656 used, 14053952+buff/cache KiB Swap:0 total,0 free,0 used. 17132937+avail Mem PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 30070 postgres 20 0 41.232g 28608 18192 R 53.8 0.0 17:35.74 postgres 30087 postgres 20 0 41.233g 28408 18180 S 53.8 0.0 17:35.69 postgres 30055 postgres 20 0 41.233g 28492 18120 R 53.5 0.0 17:41.51 postgres Note the postgres processes only running at 53% with the system at 64% idle. The 1.7% system time seems indicative of the spinlocks blocking the major processing. Do you know what resource the LockManager might be blocking on/protecting with these LWlocks? Also, I didn't understand your comment about a 'futex profile', could you point me in the right direction here? Thanks. ---Paul Paul Friedman CTO 677 Harrison St | San Francisco, CA 94107 M: (650) 270-7676 E-mail: [email protected] -Original Message- From: Andres Freund Sent: Monday, April 12, 2021 3:22 PM To: Paul Friedman Cc: [email protected] Subject: Re: LWLocks by LockManager slowing large DB Hi, On 2021-04-12 15:15:05 -0700, Paul Friedman wrote: > Thanks again for any advice you have. I think we'd need the perf profiles to be able to dig into this further. It's odd that there are a meaningful amount of LockManager contention in your case - assuming the stats you collected weren't just the first few milliseconds of starting those 60 queries, there shouldn't be any additional "heavyweight locks" taken given the duration of your queries. The futex profile hopefully will tell us from where that is coming from... Greetings, Andres Freund
Re: LWLocks by LockManager slowing large DB
Hi, On 2021-04-12 15:56:08 -0700, Paul Friedman wrote: > Also, I didn't understand your comment about a 'futex profile', could you > point me in the right direction here? My earlier mail included a section about running a perf profile showing the callers of the futex() system call, which in turn is what lwlocks end up using on linux when the lock is contended. Check the second half of: https://www.postgresql.org/message-id/20210412215738.xytq33wlciljyva5%40alap3.anarazel.de Greetings, Andres Freund
Re: LWLocks by LockManager slowing large DB
On Mon, Apr 12, 2021 at 14:57 Andres Freund wrote: > Without knowing the proportion of LockManager wait events compared to > the rest it's hard to know what to make of it. These OSS tools can be useful to understand the proportion: - pgCenter https://github.com/lesovsky/pgcenter - pg_wait_sampling (can be used together with POWA monitoring) https://github.com/postgrespro/pg_wait_sampling - pgsentinel https://github.com/pgsentinel/pgsentinel - PASH Viewer (good for visualization, integrates with pgsentinel) https://github.com/dbacvetkov/PASH-Viewer >
