Re: Optimizing count(), but Explain estimates wildly off

2024-02-27 Thread Alvaro Herrera
Hi Chema,

On 2024-Feb-26, Chema wrote:

> Dear pgsqlers,
> 
> I'm trying to optimize simple queries on two tables (tenders & items) with
> a couple million records.  Besides the resulting records, the app also
> displays the count of total results.  Doing count() takes as much time as
> the other query (which can be 30+ secs), so it's an obvious target for
> optimization.  I'm already caching count() results for the most common
> conditions (country & year) in a material table, which practically halves
> response time.  The tables are updated sparingly, and only with bulk
> COPYs.  Now I'm looking for ways to optimize queries with other conditions.

It sounds like this approach might serve your purposes:
https://www.postgresql.eu/events/pgconfeu2023/schedule/session/4762-counting-things-at-the-speed-of-light-with-roaring-bitmaps/

> I already raised default_statistics_target up to 2k (the planner wasn't
> using indexes at all with low values).  Gotta get it even higher? These are
> my custom settings:

I would recommend to put the default_statistics_target back to its
original value and modify the value with ALTER TABLE .. SET STATISTICS
only for columns that need it, only on tables that need it; then ANALYZE
everything.  The planner gets too slow if you have too many stats for
everything.

> shared_buffers = 256MB  # min 128kB

This sounds far too low, unless your server is a Raspberry Pi or
something.  See "explain (buffers, analyze)" of your queries to see how
much buffer traffic is happening for them.

> Functions: 33
> Options: Inlining true, Optimization true, Expressions true, Deforming true
> Timing: Generation 14.675 ms, Inlining 383.349 ms, Optimization 1023.521
> ms, Emission 651.442 ms, Total 2072.987 ms
> Execution Time: 63378.033 ms

Also maybe experiment with turning JIT off.  Sometimes it brings no
benefit and slows down execution pointlessly.  Here you spent two
seconds JIT-compiling the query; were they worth it?

Cheers

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.




extend statistics help reduce index scan a lot of shared buffer hits.

2024-02-27 Thread James Pang
   Postgresql 14.8, Redhat8.   looks like have to create extend statistics
on indexed and joined columns to make join filters pushed down to secondary
index scan in nestloop, and the shared buffer hits show big difference.

is it expected ?


   SELECT   
FROM
mtg a LEFT OUTER JOIN mtgxxxext
b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid
WHERE
a.SSSID=$1
AND a.MMMUID=$2
ORDER BY a. asc
offset 300 rows
FETCH FIRST 51 ROWS ONLY


explain (analyze,buffers) slowsql1(...)

1. with default, join filters just after nestloop,
Limit  (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454
rows=51 loops=1)
   Buffers: shared hit=3864917
   ->  Sort  (cost=5.61..5.61 rows=1 width=1169) (actual
time=2249.404..2249.438 rows=351 loops=1)
 Sort Key: a.email
 Sort Method: top-N heapsort  Memory: 174kB
 Buffers: shared hit=3864917
 ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1169)
(actual time=1.335..2246.971 rows=2142 loops=1)
   Join Filter: ((a.siteid = b.siteid) AND ((a.mtguuid)::text =
(b.mtguuid)::text) AND (a.uuid = b.uuid))
   Rows Removed by Join Filter: 4586022
   Buffers: shared hit=3864917
   ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on
mtg a  (cost=0.56..2.79 rows=1 width=1093) (actual
time=0.026..5.318 rows
=2142 loops=1)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR
((usertype)::text = 'Testlist'::text))
 Buffers: shared hit=2891
   ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on
mtgxxxext b  (cost=0.56..2.78 rows=1 width=133) (actual
time=0.016..0.698
rows=2142 loops=2142)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Buffers: shared hit=3862026 <<< here  huge
shared hits.
 Planning Time: 0.033 ms
 Execution Time: 2249.527 ms



 create statistics mtgxxext_sssid_mmmuuid(dependencies,ndistinct) on
sssid, mmmuuid from mtgxxxext.
 analyze mtgxxxext.



 2. join filters pushed down to secondary index scan, and reduce a lot of
shared blks access.

 
-

 Limit  (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380
rows=51 loops=1)
   Buffers: shared hit=12865
   ->  Sort  (cost=5.61..5.61 rows=1 width=1245) (actual
time=12.333..12.364 rows=351 loops=1)
 Sort Key: a.email
 Sort Method: top-N heapsort  Memory: 174kB
 Buffers: shared hit=12865
 ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1245)
(actual time=0.042..10.819 rows=2142 loops=1)
   Buffers: shared hit=12865
   ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on
mtg a  (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492
 rows=2142 loops=1)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR
((usertype)::text = 'Testlist'::text))
 Buffers: shared hit=2891
   ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on
mtgxxxext b  (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0
.003 rows=1 loops=2142)
 Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND
((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text)
AND (uui
d = a.uuid))
 Buffers: shared hit=10710<<< here much less shared
hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid.
 Planning Time: 0.021 ms
 Execution Time: 12.451 ms
(17 rows)

Thanks,

James


Fwd: extend statistics help reduce index scan a lot of shared buffer hits.

2024-02-27 Thread James Pang
   Postgresql 14.8, Redhat8.   looks like have to create extend statistics
on indexed and joined columns to make join filters pushed down to secondary
index scan in nestloop, and the shared buffer hits show big difference.

is it expected ?


   SELECT   
FROM
mtg a LEFT OUTER JOIN mtgxxxext
b ON a.sssid = b.sssid and a.MMMUUID = b.MMMUUID and a.uuid = b.uuid
WHERE
a.SSSID=$1
AND a.MMMUID=$2
ORDER BY a. asc
offset 300 rows
FETCH FIRST 51 ROWS ONLY


explain (analyze,buffers) slowsql1(...)

1. with default, join filters just after nestloop,
Limit  (cost=5.61..5.62 rows=1 width=1169) (actual time=2249.443..2249.454
rows=51 loops=1)
   Buffers: shared hit=3864917
   ->  Sort  (cost=5.61..5.61 rows=1 width=1169) (actual
time=2249.404..2249.438 rows=351 loops=1)
 Sort Key: a.email
 Sort Method: top-N heapsort  Memory: 174kB
 Buffers: shared hit=3864917
 ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1169)
(actual time=1.335..2246.971 rows=2142 loops=1)
   Join Filter: ((a.sssid = b.sssid) AND ((a.mmmuuid)::text =
(b.mmmuuid)::text) AND (a.uuid = b.uuid))
   Rows Removed by Join Filter: 4586022
   Buffers: shared hit=3864917
   ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on
mtg a  (cost=0.56..2.79 rows=1 width=1093) (actual
time=0.026..5.318 rows
=2142 loops=1)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR
((usertype)::text = 'Testlist'::text))
 Buffers: shared hit=2891
   ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on
mtgxxxext b  (cost=0.56..2.78 rows=1 width=133) (actual
time=0.016..0.698
rows=2142 loops=2142)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Buffers: shared hit=3862026 <<< here  huge
shared hits.
 Planning Time: 0.033 ms
 Execution Time: 2249.527 ms



 create statistics mtgxxext_sssid_mmmuuid(dependencies,ndistinct) on
sssid, mmmuuid from mtgxxxext.
 analyze mtgxxxext.



 2. join filters pushed down to secondary index scan, and reduce a lot of
shared blks access.

 
-

 Limit  (cost=5.61..5.62 rows=1 width=1245) (actual time=12.371..12.380
rows=51 loops=1)
   Buffers: shared hit=12865
   ->  Sort  (cost=5.61..5.61 rows=1 width=1245) (actual
time=12.333..12.364 rows=351 loops=1)
 Sort Key: a.email
 Sort Method: top-N heapsort  Memory: 174kB
 Buffers: shared hit=12865
 ->  Nested Loop Left Join  (cost=1.12..5.60 rows=1 width=1245)
(actual time=0.042..10.819 rows=2142 loops=1)
   Buffers: shared hit=12865
   ->  Index Scan using idx_mmmcfattlist_mmmuuid_f on
mtg a  (cost=0.56..2.79 rows=1 width=1169) (actual time=0.025..2.492
 rows=2142 loops=1)
 Index Cond: ((sssid = $1) AND ((mmmuuid)::text =
($2)::text))
 Filter: ((joinstatus = ANY (ARRAY[$3, $4])) OR
((usertype)::text = 'Testlist'::text))
 Buffers: shared hit=2891
   ->  Index Scan using idx_mtgattndlstext_mmmuuid_uid on
mtgxxxext b  (cost=0.56..2.79 rows=1 width=133) (actual time=0.003..0
.003 rows=1 loops=2142)
 Index Cond: ((sssid = a.sssid) AND (sssid = $1) AND
((mmmuuid)::text = (a.mmmuuid)::text) AND ((mmmuuid)::text = ($2)::text)
AND (uui
d = a.uuid))
 Buffers: shared hit=10710<<< here much less shared
hits , and Index Cond automatically added sssid = a.sssid ,mmuuid=a.mmmuuid.
 Planning Time: 0.021 ms
 Execution Time: 12.451 ms
(17 rows)

Thanks,

James