Thank you very much for your prompt responses.
I have analysed more regarding this and found the long running query.
I ran "explain analyse" on this query and I got following result. (We have 2
identical DB instances and they consist of same data. Instane 1 took 20+ second
to process and instance 2 took less than a second)
Instance 1: (This is used by regular User - More than 600,000 request a day) -
The result is same even when there is no user in the server.
EXPLAIN ANALYZE
Nested Loop Semi Join (cost=998547.53..3319573.36 rows=1 width=8) (actual
time=10568.217..22945.971 rows=22 loops=1)
-> Hash Semi Join (cost=998546.96..3319545.95 rows=41 width=16) (actual
time=10568.198..22945.663 rows=22 loops=1)
Hash Cond: (node.id = prop.node_id)
-> Bitmap Heap Scan on alf_node node (cost=995009.97..3303978.85
rows=4565737 width=8) (actual time=3304.419..20465.551 rows=41109751 loops=1)
Recheck Cond: ((store_id = 6) AND (type_qname_id = 240))
Rows Removed by Index Recheck: 54239131
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 2816
Heap Blocks: exact=24301 lossy=1875383
-> Bitmap Index Scan on idx_alf_node_mdq (cost=0.00..646144.01
rows=20047144 width=0) (actual time=3232.067..3232.067 rows=44246360 loops=1)
Index Cond: ((store_id = 6) AND (type_qname_id = 240))
SubPlan 1
-> Bitmap Heap Scan on alf_node_aspects aspect_1
(cost=2503.51..347403.58 rows=128379 width=8) (actual time=25.447..65.392
rows=5635 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=40765
-> Bitmap Index Scan on fk_alf_nasp_qn (cost=0.00..2471.41
rows=128379 width=0) (actual time=18.835..18.835 rows=239610 loops=1)
Index Cond: (qname_id = 251)
-> Hash (cost=3526.11..3526.11 rows=871 width=8) (actual time=0.045..0.045
rows=23 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties prop
(cost=0.70..3526.11 rows=871 width=8) (actual time=0.021..0.042 rows=23 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value =
'E292432'::text))
Heap Fetches: 23
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect
(cost=0.57..2.01 rows=15 width=8) (actual time=0.011..0.011 rows=1 loops=22)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 22
Planning time: 0.639 ms
Execution time: 22946.036 ms
Instance 2: (Only by testers - 250 request a day)
Nested Loop Semi Join (cost=6471.94..173560841.08 rows=2 width=8) (actual
time=0.162..0.464 rows=17 loops=1)
-> Nested Loop (cost=6471.37..173560684.36 rows=45 width=16) (actual
time=0.154..0.387 rows=17 loops=1)
-> HashAggregate (cost=3508.15..3516.80 rows=865 width=8) (actual
time=0.041..0.047 rows=18 loops=1)
Group Key: prop.node_id
-> Index Only Scan using idx_alf_nprop_s on alf_node_properties
prop (cost=0.70..3505.99 rows=866 width=8) (actual time=0.020..0.035 r
ows=18 loops=1)
Index Cond: ((qname_id = '242'::bigint) AND (string_value =
'E292432'::text))
Heap Fetches: 18
-> Index Scan using alf_node_pkey on alf_node node
(cost=2963.22..200644.11 rows=1 width=8) (actual time=0.019..0.019 rows=1
loops=18)
Index Cond: (id = prop.node_id)
Filter: ((type_qname_id <> 145) AND (store_id = 6) AND
(type_qname_id = 240) AND (NOT (SubPlan 1)))
Rows Removed by Filter: 0
SubPlan 1
-> Materialize (cost=2962.65..397912.89 rows=158204 width=8)
(actual time=0.001..0.009 rows=85 loops=17)
-> Bitmap Heap Scan on alf_node_aspects aspect_1
(cost=2962.65..396503.87 rows=158204 width=8) (actual time=0.021..0.082 rows=
85 loops=1)
Recheck Cond: (qname_id = 251)
Heap Blocks: exact=55
-> Bitmap Index Scan on fk_alf_nasp_qn
(cost=0.00..2923.10 rows=158204 width=0) (actual time=0.015..0.015 rows=87
loops=
1)
Index Cond: (qname_id = 251)
-> Index Only Scan using alf_node_aspects_pkey on alf_node_aspects aspect
(cost=0.57..34.32 rows=12 width=8) (actual time=0.004..0.004 rows=1 loop
s=17)
Index Cond: ((node_id = node.id) AND (qname_id = 245))
Heap Fetches: 17
Planning time: 0.623 ms
Execution time: 0.540 ms
Configurations are same in both servers.
Please advise me on this. Is there any configuration specifically I need to
look like “work_mem”, “Shared_buffers”, “checkpoint_segment”,
“effective_cache_size”, “enable_seqscan” and “checkpoint_compression_target”?
Thanks in advance.
Fahiz
On 9 Dec 2019, 19:03 +0000, Michael Lewis <[email protected]>, wrote:
> > > There is a specific search query I am running to get list of Documents
> > > and their metadata from several table in the DB.
> > > We are running Postgres 9.6.9 on Amazon RDS (db.m5.4xlarge instance)
> > >
> > > Our current DB consists of 500GB of data and indexes. Most of the rows in
> > > table are consist of 454,078,915
> > >
> > > With the fresh DB with the restore of the DATA without any indexes Search
> > > query performs relatively quick and most of the time its less than a
> > > second.
> > >
> > > But after 3 weeks of use of the DB it sudenly started to slowdown only
> > > for this perticular query and it takes 20+ seconds to respond. If I do a
> > > restore the DB again then it continues to work fine and the symptom pops
> > > out after 3 weeks time.
> >
> >
> > You haven't been quite clear on the situation and your use case, but
> > assuming this table has 454 million rows and experiences updates/deletes
> > then this sounds like you may be having problems with autovacuum. Have you
> > customized parameters to ensure it is running more frequently than default?
> > How are you doing those data restores? Perhaps that process is cleaning up
> > the accumulated bloat and you can run fine again for a while. Check
> > pg_stat_user_tables for the last (auto)vacuum that ran, assuming you didn't
> > just restore again and are expecting the issue to occur again soon.