Re: Optimizing count(), but Explain estimates wildly off
Hi Chema, On 2024-Feb-26, Chema wrote: > Dear pgsqlers, > > I'm trying to optimize simple queries on two tables (tenders & items) with > a couple million records. Besides the resulting records, the app also > displays the count of total results. Doing count() takes as much time as > the other query (which can be 30+ secs), so it's an obvious target for > optimization. I'm already caching count() results for the most common > conditions (country & year) in a material table, which practically halves > response time. The tables are updated sparingly, and only with bulk > COPYs. Now I'm looking for ways to optimize queries with other conditions. It sounds like this approach might serve your purposes: https://www.postgresql.eu/events/pgconfeu2023/schedule/session/4762-counting-things-at-the-speed-of-light-with-roaring-bitmaps/ > I already raised default_statistics_target up to 2k (the planner wasn't > using indexes at all with low values). Gotta get it even higher? These are > my custom settings: I would recommend to put the default_statistics_target back to its original value and modify the value with ALTER TABLE .. SET STATISTICS only for columns that need it, only on tables that need it; then ANALYZE everything. The planner gets too slow if you have too many stats for everything. > shared_buffers = 256MB # min 128kB This sounds far too low, unless your server is a Raspberry Pi or something. See "explain (buffers, analyze)" of your queries to see how much buffer traffic is happening for them. > Functions: 33 > Options: Inlining true, Optimization true, Expressions true, Deforming true > Timing: Generation 14.675 ms, Inlining 383.349 ms, Optimization 1023.521 > ms, Emission 651.442 ms, Total 2072.987 ms > Execution Time: 63378.033 ms Also maybe experiment with turning JIT off. Sometimes it brings no benefit and slows down execution pointlessly. Here you spent two seconds JIT-compiling the query; were they worth it? Cheers -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ Syntax error: function hell() needs an argument. Please choose what hell you want to involve.
extend statistics help reduce index scan a lot of shared buffer hits.
Postgresql 14.8, Redhat8. looks like have to create extend statistics on indexed and joined columns to make join filters pushed down to secondary index scan in nestloop, and the shared buffer hits show big difference. is it expected ? SELECT FROM mtg a LEFT OUTER JOIN mtgxxxext b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid WHERE a.SSSID=$1 AND a.MMMUID=$2 ORDER BY a. asc offset 300 rows FETCH FIRST 51 ROWS ONLY explain (analyze,buffers) slowsql1(...) 1. with default, join filters just after nestloop, Limit (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454 rows=51 loops=1) Buffers: shared hit=3864917 -> Sort (cost=5.61..5.61 rows=1 width=1169) (actual time=2249.404..2249.438 rows=351 loops=1) Sort Key: a.email Sort Method: top-N heapsort Memory: 174kB Buffers: shared hit=3864917 -> Nested Loop Left Join (cost=1.12..5.60 rows=1 width=1169) (actual time=1.335..2246.971 rows=2142 loops=1) Join Filter: ((a.siteid = b.siteid) AND ((a.mtguuid)::text = (b.mtguuid)::text) AND (a.uuid = b.uuid)) Rows Removed by Join Filter: 4586022 Buffers: shared hit=3864917 -> Index Scan using idx_mmmcfattlist_mmmuuid_f on mtg a (cost=0.56..2.79 rows=1 width=1093) (actual time=0.026..5.318 rows =2142 loops=1) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text)) Buffers: shared hit=2891 -> Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxext b (cost=0.56..2.78 rows=1 width=133) (actual time=0.016..0.698 rows=2142 loops=2142) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Buffers: shared hit=3862026 <<< here huge shared hits. Planning Time: 0.033 ms Execution Time: 2249.527 ms create statistics mtgxxext_sssid_mmmuuid(dependencies,ndistinct) on sssid, mmmuuid from mtgxxxext. analyze mtgxxxext. 2. join filters pushed down to secondary index scan, and reduce a lot of shared blks access. - Limit (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380 rows=51 loops=1) Buffers: shared hit=12865 -> Sort (cost=5.61..5.61 rows=1 width=1245) (actual time=12.333..12.364 rows=351 loops=1) Sort Key: a.email Sort Method: top-N heapsort Memory: 174kB Buffers: shared hit=12865 -> Nested Loop Left Join (cost=1.12..5.60 rows=1 width=1245) (actual time=0.042..10.819 rows=2142 loops=1) Buffers: shared hit=12865 -> Index Scan using idx_mmmcfattlist_mmmuuid_f on mtg a (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492 rows=2142 loops=1) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text)) Buffers: shared hit=2891 -> Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxext b (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0 .003 rows=1 loops=2142) Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND ((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text) AND (uui d = a.uuid)) Buffers: shared hit=10710<<< here much less shared hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid. Planning Time: 0.021 ms Execution Time: 12.451 ms (17 rows) Thanks, James
Fwd: extend statistics help reduce index scan a lot of shared buffer hits.
Postgresql 14.8, Redhat8. looks like have to create extend statistics on indexed and joined columns to make join filters pushed down to secondary index scan in nestloop, and the shared buffer hits show big difference. is it expected ? SELECT FROM mtg a LEFT OUTER JOIN mtgxxxext b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid WHERE a.SSSID=$1 AND a.MMMUID=$2 ORDER BY a. asc offset 300 rows FETCH FIRST 51 ROWS ONLY explain (analyze,buffers) slowsql1(...) 1. with default, join filters just after nestloop, Limit (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454 rows=51 loops=1) Buffers: shared hit=3864917 -> Sort (cost=5.61..5.61 rows=1 width=1169) (actual time=2249.404..2249.438 rows=351 loops=1) Sort Key: a.email Sort Method: top-N heapsort Memory: 174kB Buffers: shared hit=3864917 -> Nested Loop Left Join (cost=1.12..5.60 rows=1 width=1169) (actual time=1.335..2246.971 rows=2142 loops=1) Join Filter: ((a.sssid = b.sssid) AND ((a.mmmuuid)::text = (b.mmmuuid)::text) AND (a.uuid = b.uuid)) Rows Removed by Join Filter: 4586022 Buffers: shared hit=3864917 -> Index Scan using idx_mmmcfattlist_mmmuuid_f on mtg a (cost=0.56..2.79 rows=1 width=1093) (actual time=0.026..5.318 rows =2142 loops=1) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text)) Buffers: shared hit=2891 -> Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxext b (cost=0.56..2.78 rows=1 width=133) (actual time=0.016..0.698 rows=2142 loops=2142) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Buffers: shared hit=3862026 <<< here huge shared hits. Planning Time: 0.033 ms Execution Time: 2249.527 ms create statistics mtgxxext_sssid_mmmuuid(dependencies,ndistinct) on sssid, mmmuuid from mtgxxxext. analyze mtgxxxext. 2. join filters pushed down to secondary index scan, and reduce a lot of shared blks access. - Limit (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380 rows=51 loops=1) Buffers: shared hit=12865 -> Sort (cost=5.61..5.61 rows=1 width=1245) (actual time=12.333..12.364 rows=351 loops=1) Sort Key: a.email Sort Method: top-N heapsort Memory: 174kB Buffers: shared hit=12865 -> Nested Loop Left Join (cost=1.12..5.60 rows=1 width=1245) (actual time=0.042..10.819 rows=2142 loops=1) Buffers: shared hit=12865 -> Index Scan using idx_mmmcfattlist_mmmuuid_f on mtg a (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492 rows=2142 loops=1) Index Cond: ((sssid = $1) AND ((mmmuuid)::text = ($2)::text)) Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR ((usertype)::text = 'Testlist'::text)) Buffers: shared hit=2891 -> Index Scan using idx_mtgattndlstext_mmmuuid_uid on mtgxxxext b (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0 .003 rows=1 loops=2142) Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND ((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text) AND (uui d = a.uuid)) Buffers: shared hit=10710<<< here much less shared hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid. Planning Time: 0.021 ms Execution Time: 12.451 ms (17 rows) Thanks, James
