Re: AWS RDS PostgreSQL CPU Spiking to 100%

2020-09-28 Thread aditya desai
>
>
> Hi,
> We have an application where one of the APIs calling queries(attached) is
> spiking the CPU to 100% during load testing.
> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
> scan though). When run separately on DB queries hardly take less than 200
> ms. Is CPU spiking due to Bitmap Heap Scan?
> These queries are being called thousands of times. Application team says
> they have handled connection pooling from the Application side. So there is
> no connection pooling here from DB side. Current db instance size is 
> "db.m4.4xlarge"
> 64 GB RAM 16 vCPU".
> The Application dev team has primary keys and foreign keys on tables so
> they are unable to partition the tables as well due to limitations of
> postgres partitioning. Columns in WHERE clauses are not constant in all
> queries to decide partition keys.
>
> 1. Does DB need more CPU considering this kind of load?
> 2. Can the query be tuned further? It is already using indexes(Bitmap
> though).
> 3. Will connection pooling resolve the CPU Spike issues?
>
> Also pasting Query and plans below.
>
> --exampleCount 1. Without
> internalexamplecode---
>
> lmp_examples=> explain analyze with exampleCount as ( select
> examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and
> j.facilitycode in ('ABCD') and j.internalexamplecode in
> ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18
> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
> group by j.examplestatuscode)
> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
> from exampleCount jc right outer join examplestatus js on
> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>
>
>QUERY PLAN
>
>
>
> 
>  HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual
> time=88.847..88.850 rows=9 loops=1)
>Group Key: js.examplestatuscode
>CTE examplecount
>  ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual
> time=88.803..88.805 rows=5 loops=1)
>Group Key: j.examplestatuscode
>->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08
> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>  Recheck Cond: countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
> '2020-08-19 00:00:00'::timestamp without time zone)) OR
> (examplestartdatetime IS NULL))
>  Filter: (((countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])))
>  Rows Removed by Filter: 3
>  Heap Blocks: exact=18307
>  ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0)
> (actual time=15.707..15.707 rows=0 loops=1)
>->  Bitmap Index Scan on example_list9_idx
> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
> rows=62851 loops=1)
>  Index Cond: (((countrycode)::text =
> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
> time zone))
>->  Bitmap Index Scan on example_list10_idx
> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>  Index Cond: (examplestartdatetime IS NULL)
>->  Hash Left Join  (cost=0.13..1.29 rows=9 width=4) (actual
> time=88.831..88.840 rows=9 loops=1)
>  Hash Cond: ((js.examplestatuscode)::text =
> (jc.examplestatuscode)::text)
>  ->  Seq Scan on examplestatus js  (cost=0.00..1.09 rows=9
> width=4) (actual time=0.004..0.007 rows=9 loops=1)
>  ->  Hash  (cost=0.08..0.08 rows=4 width=16) (actual
> time=88.817..88.817 rows=5 loops=1)
>Buckets: 1024  Batches: 1  Memory Usage: 9kB
>->  CTE Scan on examplecount jc  (cost=0.00..0.08 rows=4
> width=16) (actual time=88.807..88.812 rows=5 loops=1)
>  Planning Time: 0.979 ms
>  Execution Time: 89.036 ms
> (23 rows)
>
>
> exampleCount 2. With
> internalexamplecode-
>
>
> lmp_examples=> explain analyze 

Re: AWS RDS PostgreSQL CPU Spiking to 100%

2020-09-28 Thread Prince Pathria
We faced a similar issue, adding RDS proxy in front of RDS Postgres can
help.
In our situation, there were a lot of connects/disconnects from Lambda
functions although concurrency of Lambda was 100 only.
And adding connection pooler(RDS proxy) helped us to reduce the CPU load
from 100% to 30%

Happy to help :)
Prince Pathria Systems Engineer | Certified Kubernetes Administrator | AWS
Certified Solutions Architect Evive +91 9478670472 goevive.com


On Mon, Sep 28, 2020 at 9:21 PM aditya desai  wrote:

>
>> Hi,
>> We have an application where one of the APIs calling queries(attached) is
>> spiking the CPU to 100% during load testing.
>> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
>> scan though). When run separately on DB queries hardly take less than 200
>> ms. Is CPU spiking due to Bitmap Heap Scan?
>> These queries are being called thousands of times. Application team says
>> they have handled connection pooling from the Application side. So there is
>> no connection pooling here from DB side. Current db instance size is 
>> "db.m4.4xlarge"
>> 64 GB RAM 16 vCPU".
>> The Application dev team has primary keys and foreign keys on tables so
>> they are unable to partition the tables as well due to limitations of
>> postgres partitioning. Columns in WHERE clauses are not constant in all
>> queries to decide partition keys.
>>
>> 1. Does DB need more CPU considering this kind of load?
>> 2. Can the query be tuned further? It is already using indexes(Bitmap
>> though).
>> 3. Will connection pooling resolve the CPU Spike issues?
>>
>> Also pasting Query and plans below.
>>
>> --exampleCount 1. Without
>> internalexamplecode---
>>
>> lmp_examples=> explain analyze with exampleCount as ( select
>> examplestatuscode from example j where 1=1 and j.countrycode = 'AD'   and
>> j.facilitycode in ('ABCD') and j.internalexamplecode in
>> ('005','006','007','005') and ((j.examplestartdatetime  between '2020-05-18
>> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
>> group by j.examplestatuscode)
>> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0)
>> stat_count from exampleCount jc right outer join examplestatus js on
>> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>>
>>
>>QUERY PLAN
>>
>>
>>
>> 
>>  HashAggregate  (cost=79353.80..79353.89 rows=9 width=12) (actual
>> time=88.847..88.850 rows=9 loops=1)
>>Group Key: js.examplestatuscode
>>CTE examplecount
>>  ->  HashAggregate  (cost=79352.42..79352.46 rows=4 width=4) (actual
>> time=88.803..88.805 rows=5 loops=1)
>>Group Key: j.examplestatuscode
>>->  Bitmap Heap Scan on example j  (cost=1547.81..79251.08
>> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>>  Recheck Cond: countrycode)::text = 'AD'::text) AND
>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
>> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
>> '2020-08-19 00:00:00'::timestamp without time zone)) OR
>> (examplestartdatetime IS NULL))
>>  Filter: (((countrycode)::text = 'AD'::text) AND
>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>> ANY ('{005,006,007,005}'::text[])))
>>  Rows Removed by Filter: 3
>>  Heap Blocks: exact=18307
>>  ->  BitmapOr  (cost=1547.81..1547.81 rows=40538 width=0)
>> (actual time=15.707..15.707 rows=0 loops=1)
>>->  Bitmap Index Scan on example_list9_idx
>> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
>> rows=62851 loops=1)
>>  Index Cond: (((countrycode)::text =
>> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
>> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
>> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
>> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
>> time zone))
>>->  Bitmap Index Scan on example_list10_idx
>> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>>  Index Cond: (examplestartdatetime IS NULL)
>>->  Hash Left Join  (cost=0.13..1.29 rows=9 width=4) (actual
>> time=88.831..88.840 rows=9 loops=1)
>>  Hash Cond: ((js.examplestatuscode)::text =
>> (jc.examplestatuscode)::text)
>>  ->  Seq Scan on exampl

Is it possible to specify minimum number of rows planner should consider?

2020-09-28 Thread Timothy Garnett
Is there some way to tell the planner that unless it's guaranteed by a
constraint or some such it shouldn't guess that the selectivity of a
filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's
guaranteed to be 1 row) or somehow otherwise make it more conservative
around the worst case possibilities. I feel like this would cover something
like 1/3 of the more problematic planner performance issues I run into. The
kind where a query suddenly runs 60,000 times slower than it did
previously. I can live with some queries being slightly slower if I can
avoid the case where they will all of sudden never complete.

My current motivating example is this (... abridged) query:

postgresql 11.7 on ubuntu linux

->  **Nested Loop Left Join**  (cost=3484616.45..5873755.65 rows=1
width=295)
  Join Filter: (hsc.c_field = c.id)
  ...
->  *Nested Loop Left Join  (cost=1072849.19..3286800.99 rows=1
width=190)*
  ->  Hash Anti Join  (cost=1072848.62..3286798.53
***rows=1***[actually 65k] width=189)
Hash Cond: (c.id = trc.field)
->  Seq Scan on c  (cost=0.00..1096064.73
rows=14328573 width=189)
->  Hash  (cost=830118.31..830118.31 rows=14794985
width=4)
  ->  Seq Scan on trc  (cost=0.00..830118.31
rows=14794985 width=4)
  ->  Index Scan using con_pkey on con  (cost=0.56..2.46
rows=1 width=9)
Index Cond: (c.con_field = id)
...
  ->  Unique  (cost=2411766.83..2479065.82 rows=4794957 width=29)
->  Sort  (cost=2411766.83..2445416.33 rows=13459797 width=29)
  Sort Key: hsc.c_field, xxx
  ->  Hash Join  (cost=11143.57..599455.83 rows=13459797
width=29)
  ...

*** is where the planner is off in it's row estimation
c.id is unique for that table, statistics set to 10k and freshly analyzed
trc.field is unique for that table, statistics set to 10k and freshly
analyzed
row estimates for those tables are pretty close to correct (within a couple
of %)
there is no foreign key constraint between those two tables
c.id and trc.field are both integers with pretty similar distributions over
1...22 million

** is where it picks a disastrous join plan based on that misstaken-row
estimate
this has to be a close call with doing a merge_join as the other side is
already sorted

* this join is ok, since even if it isn't the fastest join here with the
correct row count, given the index it's not much worse


I can work around this by breaking up the query (e.g. creating a temporary
table of the selected ids, analyzing it then using it in the rest of the
query) or by temporarily disabling nestedloop joins (which makes other
parts of the query slower, but not dramatically so), but is there some
other reasonable proactive way to avoid it?  It was running fine for a year
before blowing up (trigger is I suspect the trc table getting enough larger
than the c table, originally it was smaller) and I hit similarish kinds of
issues every so often.

Tim


Re: Is it possible to specify minimum number of rows planner should consider?

2020-09-28 Thread Tom Lane
Timothy Garnett  writes:
> Is there some way to tell the planner that unless it's guaranteed by a
> constraint or some such it shouldn't guess that the selectivity of a
> filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's
> guaranteed to be 1 row) or somehow otherwise make it more conservative
> around the worst case possibilities.

There's been some discussion in that area, but it's a hard problem
to solve in general, and especially so if you'd like to not break
a ton of queries that work nicely today.

regards, tom lane




Re: Is it possible to specify minimum number of rows planner should consider?

2020-09-28 Thread Matthew Bellew
Here is a commit that accomplishes this with a configuration parameter.

https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a

On Mon, Sep 28, 2020 at 2:07 PM Tom Lane  wrote:

> Timothy Garnett  writes:
> > Is there some way to tell the planner that unless it's guaranteed by a
> > constraint or some such it shouldn't guess that the selectivity of a
> > filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's
> > guaranteed to be 1 row) or somehow otherwise make it more conservative
> > around the worst case possibilities.
>
> There's been some discussion in that area, but it's a hard problem
> to solve in general, and especially so if you'd like to not break
> a ton of queries that work nicely today.
>
> regards, tom lane
>
>
>