Re: AWS RDS PostgreSQL CPU Spiking to 100%
>
>
> Hi,
> We have an application where one of the APIs calling queries(attached) is
> spiking the CPU to 100% during load testing.
> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
> scan though). When run separately on DB queries hardly take less than 200
> ms. Is CPU spiking due to Bitmap Heap Scan?
> These queries are being called thousands of times. Application team says
> they have handled connection pooling from the Application side. So there is
> no connection pooling here from DB side. Current db instance size is
> "db.m4.4xlarge"
> 64 GB RAM 16 vCPU".
> The Application dev team has primary keys and foreign keys on tables so
> they are unable to partition the tables as well due to limitations of
> postgres partitioning. Columns in WHERE clauses are not constant in all
> queries to decide partition keys.
>
> 1. Does DB need more CPU considering this kind of load?
> 2. Can the query be tuned further? It is already using indexes(Bitmap
> though).
> 3. Will connection pooling resolve the CPU Spike issues?
>
> Also pasting Query and plans below.
>
> --exampleCount 1. Without
> internalexamplecode---
>
> lmp_examples=> explain analyze with exampleCount as ( select
> examplestatuscode from example j where 1=1 and j.countrycode = 'AD' and
> j.facilitycode in ('ABCD') and j.internalexamplecode in
> ('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
> group by j.examplestatuscode)
> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
> from exampleCount jc right outer join examplestatus js on
> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>
>
>QUERY PLAN
>
>
>
>
> HashAggregate (cost=79353.80..79353.89 rows=9 width=12) (actual
> time=88.847..88.850 rows=9 loops=1)
>Group Key: js.examplestatuscode
>CTE examplecount
> -> HashAggregate (cost=79352.42..79352.46 rows=4 width=4) (actual
> time=88.803..88.805 rows=5 loops=1)
>Group Key: j.examplestatuscode
>-> Bitmap Heap Scan on example j (cost=1547.81..79251.08
> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
> Recheck Cond: countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
> '2020-08-19 00:00:00'::timestamp without time zone)) OR
> (examplestartdatetime IS NULL))
> Filter: (((countrycode)::text = 'AD'::text) AND
> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
> ANY ('{005,006,007,005}'::text[])))
> Rows Removed by Filter: 3
> Heap Blocks: exact=18307
> -> BitmapOr (cost=1547.81..1547.81 rows=40538 width=0)
> (actual time=15.707..15.707 rows=0 loops=1)
>-> Bitmap Index Scan on example_list9_idx
> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
> rows=62851 loops=1)
> Index Cond: (((countrycode)::text =
> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
> time zone))
>-> Bitmap Index Scan on example_list10_idx
> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
> Index Cond: (examplestartdatetime IS NULL)
>-> Hash Left Join (cost=0.13..1.29 rows=9 width=4) (actual
> time=88.831..88.840 rows=9 loops=1)
> Hash Cond: ((js.examplestatuscode)::text =
> (jc.examplestatuscode)::text)
> -> Seq Scan on examplestatus js (cost=0.00..1.09 rows=9
> width=4) (actual time=0.004..0.007 rows=9 loops=1)
> -> Hash (cost=0.08..0.08 rows=4 width=16) (actual
> time=88.817..88.817 rows=5 loops=1)
>Buckets: 1024 Batches: 1 Memory Usage: 9kB
>-> CTE Scan on examplecount jc (cost=0.00..0.08 rows=4
> width=16) (actual time=88.807..88.812 rows=5 loops=1)
> Planning Time: 0.979 ms
> Execution Time: 89.036 ms
> (23 rows)
>
>
> exampleCount 2. With
> internalexamplecode-
>
>
> lmp_examples=> explain analyze
Re: AWS RDS PostgreSQL CPU Spiking to 100%
We faced a similar issue, adding RDS proxy in front of RDS Postgres can
help.
In our situation, there were a lot of connects/disconnects from Lambda
functions although concurrency of Lambda was 100 only.
And adding connection pooler(RDS proxy) helped us to reduce the CPU load
from 100% to 30%
Happy to help :)
Prince Pathria Systems Engineer | Certified Kubernetes Administrator | AWS
Certified Solutions Architect Evive +91 9478670472 goevive.com
On Mon, Sep 28, 2020 at 9:21 PM aditya desai wrote:
>
>> Hi,
>> We have an application where one of the APIs calling queries(attached) is
>> spiking the CPU to 100% during load testing.
>> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
>> scan though). When run separately on DB queries hardly take less than 200
>> ms. Is CPU spiking due to Bitmap Heap Scan?
>> These queries are being called thousands of times. Application team says
>> they have handled connection pooling from the Application side. So there is
>> no connection pooling here from DB side. Current db instance size is
>> "db.m4.4xlarge"
>> 64 GB RAM 16 vCPU".
>> The Application dev team has primary keys and foreign keys on tables so
>> they are unable to partition the tables as well due to limitations of
>> postgres partitioning. Columns in WHERE clauses are not constant in all
>> queries to decide partition keys.
>>
>> 1. Does DB need more CPU considering this kind of load?
>> 2. Can the query be tuned further? It is already using indexes(Bitmap
>> though).
>> 3. Will connection pooling resolve the CPU Spike issues?
>>
>> Also pasting Query and plans below.
>>
>> --exampleCount 1. Without
>> internalexamplecode---
>>
>> lmp_examples=> explain analyze with exampleCount as ( select
>> examplestatuscode from example j where 1=1 and j.countrycode = 'AD' and
>> j.facilitycode in ('ABCD') and j.internalexamplecode in
>> ('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
>> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
>> group by j.examplestatuscode)
>> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0)
>> stat_count from exampleCount jc right outer join examplestatus js on
>> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>>
>>
>>QUERY PLAN
>>
>>
>>
>>
>> HashAggregate (cost=79353.80..79353.89 rows=9 width=12) (actual
>> time=88.847..88.850 rows=9 loops=1)
>>Group Key: js.examplestatuscode
>>CTE examplecount
>> -> HashAggregate (cost=79352.42..79352.46 rows=4 width=4) (actual
>> time=88.803..88.805 rows=5 loops=1)
>>Group Key: j.examplestatuscode
>>-> Bitmap Heap Scan on example j (cost=1547.81..79251.08
>> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>> Recheck Cond: countrycode)::text = 'AD'::text) AND
>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
>> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
>> '2020-08-19 00:00:00'::timestamp without time zone)) OR
>> (examplestartdatetime IS NULL))
>> Filter: (((countrycode)::text = 'AD'::text) AND
>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>> ANY ('{005,006,007,005}'::text[])))
>> Rows Removed by Filter: 3
>> Heap Blocks: exact=18307
>> -> BitmapOr (cost=1547.81..1547.81 rows=40538 width=0)
>> (actual time=15.707..15.707 rows=0 loops=1)
>>-> Bitmap Index Scan on example_list9_idx
>> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
>> rows=62851 loops=1)
>> Index Cond: (((countrycode)::text =
>> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
>> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
>> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
>> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
>> time zone))
>>-> Bitmap Index Scan on example_list10_idx
>> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>> Index Cond: (examplestartdatetime IS NULL)
>>-> Hash Left Join (cost=0.13..1.29 rows=9 width=4) (actual
>> time=88.831..88.840 rows=9 loops=1)
>> Hash Cond: ((js.examplestatuscode)::text =
>> (jc.examplestatuscode)::text)
>> -> Seq Scan on exampl
Is it possible to specify minimum number of rows planner should consider?
Is there some way to tell the planner that unless it's guaranteed by a constraint or some such it shouldn't guess that the selectivity of a filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's guaranteed to be 1 row) or somehow otherwise make it more conservative around the worst case possibilities. I feel like this would cover something like 1/3 of the more problematic planner performance issues I run into. The kind where a query suddenly runs 60,000 times slower than it did previously. I can live with some queries being slightly slower if I can avoid the case where they will all of sudden never complete. My current motivating example is this (... abridged) query: postgresql 11.7 on ubuntu linux -> **Nested Loop Left Join** (cost=3484616.45..5873755.65 rows=1 width=295) Join Filter: (hsc.c_field = c.id) ... -> *Nested Loop Left Join (cost=1072849.19..3286800.99 rows=1 width=190)* -> Hash Anti Join (cost=1072848.62..3286798.53 ***rows=1***[actually 65k] width=189) Hash Cond: (c.id = trc.field) -> Seq Scan on c (cost=0.00..1096064.73 rows=14328573 width=189) -> Hash (cost=830118.31..830118.31 rows=14794985 width=4) -> Seq Scan on trc (cost=0.00..830118.31 rows=14794985 width=4) -> Index Scan using con_pkey on con (cost=0.56..2.46 rows=1 width=9) Index Cond: (c.con_field = id) ... -> Unique (cost=2411766.83..2479065.82 rows=4794957 width=29) -> Sort (cost=2411766.83..2445416.33 rows=13459797 width=29) Sort Key: hsc.c_field, xxx -> Hash Join (cost=11143.57..599455.83 rows=13459797 width=29) ... *** is where the planner is off in it's row estimation c.id is unique for that table, statistics set to 10k and freshly analyzed trc.field is unique for that table, statistics set to 10k and freshly analyzed row estimates for those tables are pretty close to correct (within a couple of %) there is no foreign key constraint between those two tables c.id and trc.field are both integers with pretty similar distributions over 1...22 million ** is where it picks a disastrous join plan based on that misstaken-row estimate this has to be a close call with doing a merge_join as the other side is already sorted * this join is ok, since even if it isn't the fastest join here with the correct row count, given the index it's not much worse I can work around this by breaking up the query (e.g. creating a temporary table of the selected ids, analyzing it then using it in the rest of the query) or by temporarily disabling nestedloop joins (which makes other parts of the query slower, but not dramatically so), but is there some other reasonable proactive way to avoid it? It was running fine for a year before blowing up (trigger is I suspect the trc table getting enough larger than the c table, originally it was smaller) and I hit similarish kinds of issues every so often. Tim
Re: Is it possible to specify minimum number of rows planner should consider?
Timothy Garnett writes: > Is there some way to tell the planner that unless it's guaranteed by a > constraint or some such it shouldn't guess that the selectivity of a > filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's > guaranteed to be 1 row) or somehow otherwise make it more conservative > around the worst case possibilities. There's been some discussion in that area, but it's a hard problem to solve in general, and especially so if you'd like to not break a ton of queries that work nicely today. regards, tom lane
Re: Is it possible to specify minimum number of rows planner should consider?
Here is a commit that accomplishes this with a configuration parameter. https://github.com/labkey-matthewb/postgres/commit/b1fd99f4deffbbf3db2172ccaba51a34f18d1b1a On Mon, Sep 28, 2020 at 2:07 PM Tom Lane wrote: > Timothy Garnett writes: > > Is there some way to tell the planner that unless it's guaranteed by a > > constraint or some such it shouldn't guess that the selectivity of a > > filter/anti-join is 1 row (e.g. minimum to consider is 2 rows unless it's > > guaranteed to be 1 row) or somehow otherwise make it more conservative > > around the worst case possibilities. > > There's been some discussion in that area, but it's a hard problem > to solve in general, and especially so if you'd like to not break > a ton of queries that work nicely today. > > regards, tom lane > > >
