AW: Query performance issue

2020-09-08 Thread Dirk Krautschick
Hi %,

in order to be able to readjust the effects of the stored procedure and, if 
necessary, 
to save turnaround times, different requests can be concatenated using 
semicolons for
bundling several statements in one request. We did some tests against a 
postgres cluster.

The results in terms of optimizations are as follows:


Batchsize  | clients|  count Queries | average s/query| comment
--|-|--|--|---
1| 1|  15.86k|  2.24ms | 
10   | 1|  31.80k|  332us  | 
25   | 1|  31.75k|  312us  | 
50   | 1|  32.00k|  280us  | 
100  | 1|  32.00k|  286us  | 
 |  ||  
  | 
1| 2|  57.1k |  733us  | Drop to 
30k after some time!!
10   | 2|  63.6k |  323us  | 
25   | 2|  63.5k |  308us  | 
50   | 2|  64k   |  293us  | 
100  | 2|  67.2k |  290us  | 
   |   ||   
 | 
1| 10   |  158.6k|  2.15ms | 
10   | 10   |  298.9k|  383us  | Drop to 
~200k!!
25   | 10   |  225k  |  1.16ms | 
50   | 10   |  192k  |  1.55ms | 
100  | 10   |  201.6k|  1.44ms | 
 |  |  |
 | 
10   | 50   |  800k |  2.2ms   | 


It seems to be saturated here at around 200k requests per minute, 
the question remains why this is so.

Does anyone has experience with something similar or are there some
hints about how to optimize the postgres cluster for such bundled statements?

Thanks and best regards

Dirk


Query Performance in bundled requests

2020-09-08 Thread Dirk Krautschick
Update: Better title and format corrections

Hi %,

in order to be able to readjust the effects of the stored procedure and, if 
necessary, to save turnaround times, different requests can be concatenated 
using semicolons for bundling several statements in one request. We did some 
tests against a postgres cluster.

The results in terms of optimizations are as follows:


Batchsize  | clients|  count Queries | average s/query| comment
--|-|--|--|-
1| 1|  15.86k|  2.24ms | 
10   | 1|  31.80k|  332us  | 
25   | 1|  31.75k|  312us  | 
50   | 1|  32.00k|  280us  | 
100  | 1|  32.00k|  286us  | 
 |  ||  
  | 
1| 2|  57.1k |  733us  | Drop to 
30k after some time!!
10   | 2|  63.6k |  323us  | 
25   | 2|  63.5k |  308us  | 
50   | 2|  64k   |  293us  | 
100  | 2|  67.2k |  290us  | 
   |   ||   
 | 
1| 10   |  158.6k|  2.15ms | 
10   | 10   |  298.9k|  383us  | Drop to 
~200k!!
25   | 10   |  225k  |  1.16ms | 
50   | 10   |  192k  |  1.55ms | 
100  | 10   |  201.6k|  1.44ms | 
 |  |  |
 | 
10   | 50   |  800k |  2.2ms   | 


It seems to be saturated here at around 200k requests per minute, the question 
remains why this is so.

Does anyone has experience with something similar or are there some hints about 
how to optimize the postgres cluster for such bundled statements?

Thanks and best regards

Dirk


Re: Query Performance in bundled requests

2020-09-08 Thread Justin Pryzby
On Tue, Sep 08, 2020 at 10:30:50AM +, Dirk Krautschick wrote:
> Update: Better title and format corrections
> 
> Hi %,
> 
> in order to be able to readjust the effects of the stored procedure and, if 
> necessary, to save turnaround times, different requests can be concatenated 
> using semicolons for bundling several statements in one request. We did some 
> tests against a postgres cluster.
> 
> The results in terms of optimizations are as follows:
> 
> 
> Batchsize  | clients|  count Queries | average s/query| comment
> --|-|--|--|-
> 1  | 1|  15.86k|  2.24ms | 
> 10 | 1|  31.80k|  332us  | 
> 25 | 1|  31.75k|  312us  | 
> 50 | 1|  32.00k|  280us  | 

I guess you're looking at the minimum of 280us.

; 1/(280e-6) * 60
~214285.71428571428571428571

> the question remains why this is so.

You can't expect it to go a billion times faster just by putting a billion
queries in one request, and at 50 batches it looks like you've hit the next
performance bottleneck.  Whether that's CPU / IO / network / locks / RAM /
planner / context switches / logging / ??? remains to be seen.

> Does anyone has experience with something similar or are there some
> hints about how to optimize the postgres cluster for such bundled statements?

I think at this step you want to optimize for what the statements are doing,
not for the statements themselves.  Could you send a query plan for the stored
procedure ?

Also, you'd maybe want to think if there's a way you can avoid making 100s of
1000s of requests per second, rather than trying to optimize for it.  Can you
make another stored procedure which handles N requests rather than calling this
SP N times ?  There's no guarantee that won't hit the same or other bottleneck,
until you see what that is.

-- 
Justin




AWS RDS PostgreSQL CPU Spiking to 100%

2020-09-08 Thread aditya desai
Hi,
We have an application where one of the APIs calling queries(attached) is
spiking the CPU to 100% during load testing.
However, queries are making use of indexes(Bitmap Index and Bitmap Heap
scan though). When run separately on DB queries hardly take less than 200
ms. Is CPU spiking due to Bitmap Heap Scan?
These queries are being called thousands of times. Application team says
they have handled connection pooling from the Application side. So there is
no connection pooling here from DB side. Current db instance size is
"db.m4.4xlarge"
64 GB RAM 16 vCPU".
The Application dev team has primary keys and foreign keys on tables so
they are unable to partition the tables as well due to limitations of
postgres partitioning. Columns in WHERE clauses are not constant in all
queries to decide partition keys.

1. Does DB need more CPU considering this kind of load?
2. Can the query be tuned further? It is already using indexes(Bitmap
though).
3. Will connection pooling resolve the CPU Spike issues?

Also pasting Query and plans below.

--exampleCount 1. Without
internalexamplecode---

lmp_examples=> explain analyze with exampleCount as ( select
examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and
j.facilitycode in ('ABCD') and j.internalexamplecode in
('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18
00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
group by j.examplestatuscode)
lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
from exampleCount jc right outer join examplestatus js on
jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;


 QUERY PLAN



 HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual
time=88.847..88.850 rows=9 loops=1)
   Group Key: js.examplestatuscode
   CTE examplecount
 ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual
time=88.803..88.805 rows=5 loops=1)
   Group Key: j.examplestatuscode
   ->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08
rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
 Recheck Cond: countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
'2020-08-19 00:00:00'::timestamp without time zone)) OR
(examplestartdatetime IS NULL))
 Filter: (((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])))
 Rows Removed by Filter: 3
 Heap Blocks: exact=18307
 ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0)
(actual time=15.707..15.707 rows=0 loops=1)
   ->  Bitmap Index Scan on example_list9_idx
(cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
rows=62851 loops=1)
 Index Cond: (((countrycode)::text =
'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
(examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
time zone))
   ->  Bitmap Index Scan on example_list10_idx
(cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
 Index Cond: (examplestartdatetime IS NULL)
   ->  Hash Left Join  (cost=0.13..1.29 rows=9 width=4) (actual
time=88.831..88.840 rows=9 loops=1)
 Hash Cond: ((js.examplestatuscode)::text =
(jc.examplestatuscode)::text)
 ->  Seq Scan on examplestatus js  (cost=0.00..1.09 rows=9 width=4)
(actual time=0.004..0.007 rows=9 loops=1)
 ->  Hash  (cost=0.08..0.08 rows=4 width=16) (actual
time=88.817..88.817 rows=5 loops=1)
   Buckets: 1024  Batches: 1  Memory Usage: 9kB
   ->  CTE Scan on examplecount jc  (cost=0.00..0.08 rows=4
width=16) (actual time=88.807..88.812 rows=5 loops=1)
 Planning Time: 0.979 ms
 Execution Time: 89.036 ms
(23 rows)


exampleCount 2. With
internalexamplecode-


lmp_examples=> explain analyze with exampleCount as ( select
examplestatuscode,count(1) stat_count from example j where 1=1 and
j.countrycode = 'AD'   and j.facilitycode in ('ABCD') and
j.internalexamplecode in ('005','0

Re: AWS RDS PostgreSQL CPU Spiking to 100%

2020-09-08 Thread Pavel Stehule
Ășt 8. 9. 2020 v 15:33 odesĂ­latel aditya desai  napsal:

> Hi,
> We have an application where one of the APIs calling queries(attached) is
> spiking the CPU to 100% during load testing.
> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
> scan though). When run separately on DB queries hardly take less than 200
> ms. Is CPU spiking due to Bitmap Heap Scan?
> These queries are being called thousands of times. Application team says
> they have handled connection pooling from the Application side. So there is
> no connection pooling here from DB side. Current db instance size is 
> "db.m4.4xlarge"
> 64 GB RAM 16 vCPU".
> The Application dev team has primary keys and foreign keys on tables so
> they are unable to partition the tables as well due to limitations of
> postgres partitioning. Columns in WHERE clauses are not constant in all
> queries to decide partition keys.
>
>
if you have a lot of connection/disconnection per sec (more than ten or
twenty), then connection pooling can be a significant win.

One symptom of this issue can be high cpu.

Regards

Pavel



> 1. Does DB need more CPU considering this kind of load?
> 2. Can the query be tuned further? It is already using indexes(Bitmap
> though).
> 3. Will connection pooling resolve the CPU Spike issues?
>
> Also pasting Query and plans below.
>
> --exampleCount 1. Without
> internalexamplecode---
>
> lmp_examples=> explain analyze with exampleCount as ( select
> examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and
> j.facilitycode in ('ABCD') and j.internalexamplecode in
> ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18
> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
> group by j.examplestatuscode)
> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
> from exampleCount jc right outer join examplestatus js on
> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>
>
>QUERY PLAN
>
>
>
> 
>  HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual
> time=88.847..88.850 rows=9 loops=1)
>Group Key: js.examplestatuscode
>CTE examplecount
>  ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual
> time=88.803..88.805 rows=5 loops=1)
>Group Key: j.examplestatuscode
>->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08
> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>  Recheck Cond: countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
> '2020-08-19 00:00:00'::timestamp without time zone)) OR
> (examplestartdatetime IS NULL))
>  Filter: (((countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])))
>  Rows Removed by Filter: 3
>  Heap Blocks: exact=18307
>  ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0)
> (actual time=15.707..15.707 rows=0 loops=1)
>->  Bitmap Index Scan on example_list9_idx
> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
> rows=62851 loops=1)
>  Index Cond: (((countrycode)::text =
> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
> time zone))
>->  Bitmap Index Scan on example_list10_idx
> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>  Index Cond: (examplestartdatetime IS NULL)
>->  Hash Left Join  (cost=0.13..1.29 rows=9 width=4) (actual
> time=88.831..88.840 rows=9 loops=1)
>  Hash Cond: ((js.examplestatuscode)::text =
> (jc.examplestatuscode)::text)
>  ->  Seq Scan on examplestatus js  (cost=0.00..1.09 rows=9
> width=4) (actual time=0.004..0.007 rows=9 loops=1)
>  ->  Hash  (cost=0.08..0.08 rows=4 width=16) (actual
> time=88.817..88.817 rows=5 loops=1)
>Buckets: 1024  Batches: 1  Memory Usage: 9kB
>->  CTE Scan on examplecount jc  (cost=0.00..0.08 rows=4
> width=16)

Re: AWS RDS PostgreSQL CPU Spiking to 100%

2020-09-08 Thread Jeff Janes
On Tue, Sep 8, 2020 at 9:33 AM aditya desai  wrote:

> Hi,
> We have an application where one of the APIs calling queries(attached) is
> spiking the CPU to 100% during load testing.
> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
> scan though).
>

The CPU is there to be used.  Anything will use 100% of the CPU unless it
runs into some other bottleneck first.

These queries are being called thousands of times.
>

Over what time period?  At what concurrency level?



> Application team says they have handled connection pooling from the
> Application side.
>

Did they do it correctly?  Are you seeing a lot of connections churning
through?


> 1. Does DB need more CPU considering this kind of load?
>

Is it currently running fast enough, or does it need to be faster?


> 2. Can the query be tuned further?
>

The query you show can't possibly generate the plan you show, so there is
no way to know that.


> 3. Will connection pooling resolve the CPU Spike issues?
>

Not if the app-side pooling was done correctly.


>
> Also pasting Query and plans below.
>
> --exampleCount 1. Without
> internalexamplecode---
>
> lmp_examples=> explain analyze with exampleCount as ( select
> examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and
> j.facilitycode in ('ABCD') and j.internalexamplecode in
> ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18
> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
> group by j.examplestatuscode)
> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
> from exampleCount jc right outer join examplestatus js on
> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>
>
>QUERY PLAN
>
>
>
> 
>  HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual
> time=88.847..88.850 rows=9 loops=1)
>Group Key: js.examplestatuscode
>CTE examplecount
>  ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual
> time=88.803..88.805 rows=5 loops=1)
>Group Key: j.examplestatuscode
>->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08
> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>  Recheck Cond: countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
> '2020-08-19 00:00:00'::timestamp without time zone)) OR
> (examplestartdatetime IS NULL))
>

Note that the parenthesization of the OR condition is different between the
recheck, and the query itself.  So I think that either the query or the
plan has not been presented accurately.  Please double check them.

Also, what version of PostgreSQL are you using?  In v12, the CTE gets
optimized away entirely.

Cheers,

Jeff

>