Need help on Query Tunning and Not using the Index Scan

2022-05-20 Thread Kumar, Mukesh
Hi Team,

We are facing an issue in running the query which takes at least 30 sec to run 
in PostgreSQL.

We have tried to create the indexes and done the maintenance and still that 
query is taking same time.

Below are the explain plan for the query.

https://explain.depesz.com/s/sPo2#html

We have noticed that maximum time it is takin is do a Seq Scan on Table 
ps_delay_statistic which consist of approx. 35344812 records .

Can anyone please help on the above issue.

Thanks and Regards,
Mukesh Kumar



Selecting RAM and CPU based on max_connections

2022-05-20 Thread aditya desai
Hi,
One of our applications needs 3000 max_connections to the database.
Connection pooler like pgbouncer or pgpool is not certified within the
organization yet. So they are looking for setting up high configuration
Hardware with CPU and Memory. Can someone advise how much memory and CPU
they will need if they want max_conenction value=3000.

Regards,
Aditya.


Re: Selecting RAM and CPU based on max_connections

2022-05-20 Thread Andreas Kretschmer
On 20 May 2022 10:27:50 CEST, aditya desai  wrote:
>Hi,
>One of our applications needs 3000 max_connections to the database.
>Connection pooler like pgbouncer or pgpool is not certified within the
>organization yet. So they are looking for setting up high configuration
>Hardware with CPU and Memory. Can someone advise how much memory and CPU
>they will need if they want max_conenction value=3000.
>
>Regards,
>Aditya.

Pgbouncer would be the best solution. CPU: number of concurrent connections. 
RAM: shared_buffer + max_connections * work_mem + maintenance_mem + operating 
system + ...
 


-- 
2ndQuadrant - The PostgreSQL Support Company




Re: Selecting RAM and CPU based on max_connections

2022-05-20 Thread Laurenz Albe
On Fri, 2022-05-20 at 12:15 +0200, Andreas Kretschmer wrote:
> On 20 May 2022 10:27:50 CEST, aditya desai  wrote:
> > One of our applications needs 3000 max_connections to the database.
> > Connection pooler like pgbouncer or pgpool is not certified within the
> > organization yet. So they are looking for setting up high configuration
> > Hardware with CPU and Memory. Can someone advise how much memory and CPU
> > they will need if they want max_conenction value=3000.
> 
> Pgbouncer would be the best solution. CPU: number of concurrent connections.
> RAM: shared_buffer + max_connections * work_mem + maintenance_mem + operating 
> system + ...

Right.  And then hope and pray that a) the database doesn't get overloaded
and b) you don't hit any of the database-internal bottlenecks caused by many
connections.

I also got the feeling that the Linux kernel's memory accounting somehow lags.
I have seen cases where every snapshot of "pg_stat_activity" I took showed
only a few active connections (but each time different ones), but the
amount of allocated memory exceeded what the currently active sessions could
consume.  I may have made a mistake, and I have no reproducer, but I would
be curious to know if there is an explanation for that.
(I am aware that "top" shows shared buffers multiple times).

Yours,
Laurenz Albe




Re: Need help on Query Tunning and Not using the Index Scan

2022-05-20 Thread Laurenz Albe
On Fri, 2022-05-20 at 07:37 +, Kumar, Mukesh wrote:
> We are facing an issue in running the query which takes at least 30 sec to 
> run in PostgreSQL.
>  
> We have tried to create the indexes and done the maintenance and still that 
> query is taking same time.
>  
> Below are the explain plan for the query.
>  
> https://explain.depesz.com/s/sPo2#html
>  
> We have noticed that maximum time it is takin is do a Seq Scan on Table 
> ps_delay_statistic which consist of approx. 35344812 records .
>  
> Can anyone please help on the above issue.

The problem is probably here:

->  GroupAggregate  (cost=0.57..18153.25 rows=2052 width=23) (actual 
time=13.764..13.765 rows=1 loops=1)
  Group Key: ds_1.fleet_object_number_f"
  ->  Index Scan using ndx_delay_stat_equipment on ps_delay_statistic ds_1  
(cost=0.57..18050.67 rows=16412 width=23) (actual time=0.026..10.991 rows=18180 
loops=1)
Index Cond: (fleet_object_number_f = 
(COALESCE(NULLIF('40001000277313'::text, ''::text)))::numeric)
Filter: (activity_code_f IS NOT NULL)

which comes from this subquery:

SELECT max(dp1.daily_production_id) prodId
 FROM ps_daily_production_v dp1
WHERE dp1.fleet_object_number = cast(coalesce(nullif 
(cast(40001000277313 AS varchar), ''), NULL) AS numeric)
  AND dp1.activity_code IS NOT NULL
GROUP BY dp1.fleet_object_number

Remove the superfluous GROUP BY clause that confuses the optimizer.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Array of integer indexed nested-loop semi join

2022-05-20 Thread Mickael van der Beek
Hello Jeff,

Sorry for the delay, here are the EXPLAIN ANALYSE results for one single
row in the inner-query:

Nested Loop Semi Join  (cost=1000993.81..10004731160.70 rows=536206
> width=28) (actual time=93765.182..93765.183 rows=0 loops=1)
>   Output: fu.w2_page_idxs
>   Join Filter: (fu.w2_page_idxs && (ARRAY[fact_pages.idx]))
>   Rows Removed by Join Filter: 53762825
>   Buffers: shared hit=569194 read=2821768
>   I/O Timings: read=56586.955
>   ->  Seq Scan on public.fact_users fu
>  (cost=100.00..10003925857.68 rows=53620568 width=28) (actual
> time=79.139..67423.779 rows=53762825 loops=1)
> Output: fu.w2_page_idxs
> Buffers: shared hit=567884 read=2821768
> I/O Timings: read=56586.955
>   ->  Materialize  (cost=993.81..994.50 rows=1 width=32) (actual
> time=0.000..0.000 rows=1 loops=53762825)
> Output: (ARRAY[fact_pages.idx])
> Buffers: shared hit=148
> ->  Limit  (cost=993.81..994.48 rows=1 width=32) (actual
> time=26.382..26.383 rows=1 loops=1)
>   Output: (ARRAY[fact_pages.idx])
>   Buffers: shared hit=148
>   ->  Bitmap Heap Scan on public.fact_pages
>  (cost=993.81..70645.00 rows=103556 width=32) (actual time=26.378..26.379
> rows=1 loops=1)
> Output: ARRAY[fact_pages.idx]
> Recheck Cond: (fact_pages.attribute_idxs &&
> '{30160}'::integer[])
> Heap Blocks: exact=1
> Buffers: shared hit=148
> ->  Bitmap Index Scan on fact_pages_attribute_idxs_int
>  (cost=0.00..967.92 rows=103556 width=0) (actual time=14.865..14.865
> rows=101462 loops=1)
>   Index Cond: (fact_pages.attribute_idxs &&
> '{30160}'::integer[])
>   Buffers: shared hit=147
> Query Identifier: 6779965332684941204
> Planning:
>   Buffers: shared hit=2
> Planning Time: 0.162 ms
> JIT:
>   Functions: 10
>   Options: Inlining true, Optimization true, Expressions true, Deforming
> true
>   Timing: Generation 1.507 ms, Inlining 9.797 ms, Optimization 54.902 ms,
> Emission 14.314 ms, Total 80.521 ms
> Execution Time: 93766.772 ms


Query:

EXPLAIN (
>   ANALYZE,
>   VERBOSE,
>   COSTS,
>   BUFFERS,
>   TIMING
> )
> SELECT
>   fu.w2_page_idxs
> FROM
>   fact_users
> AS fu
> WHERE
>   EXISTS (
> SELECT
> FROM
>   (
> SELECT
>   ARRAY[idx] AS page_idx
> FROM
>   fact_pages
> WHERE
>   attribute_idxs && ARRAY[30160]
> FETCH FIRST 1 ROWS ONLY
>   )
> AS fp
> WHERE
>   fu.w2_page_idxs && fp.page_idx
>   )
> ;


Without any surprises, the planner is using a sequential scan on the
"fact_users" table which is very large instead of using the GIN index set
on the "w2_page_idxs" column.

Link to the query plan visualiser: https://explain.dalibo.com/plan/1vC

Thank you very much in advance,

Mickael

On Wed, Apr 27, 2022 at 4:54 PM Mickael van der Beek <
[email protected]> wrote:

> Hello Jeff,
>
> I have waited a few hours without the query ever finishing which is the
> reason I said "never finishes".
> Especially because the INNER JOIN version finishes within a few minutes
> while being combinatorial and less efficient.
> The query probably only does sequential scans.
>
> You will find the query plan using EXPLAIN here:
> - Visual query plan: https://explain.dalibo.com/plan#plan
> - Raw query plan: https://explain.dalibo.com/plan#raw
>
> Thanks for your help,
>
> Mickael
>
> On Wed, Apr 27, 2022 at 4:28 PM Jeff Janes  wrote:
>
>> On Wed, Apr 27, 2022 at 8:19 AM Mickael van der Beek <
>> [email protected]> wrote:
>>
>>>
>>> The last query does not finish after waiting for more than 15 minutes.
>>> (The temporary view creation is very fast and required due to the same
>>> query in a CTE greatly reducing performance (by more than 5 min.) due to
>>> the optimisation barrier I'm guessing.)
>>>
>>
>> How much over 15 minutes?  20 minutes doesn't seem that long to wait to
>> get a likely definitive answer.  But at the least show us the EXPLAIN
>> without ANALYZE of it, that should take no milliseconds.
>>
>> And what does it mean for something to take 5 minutes longer than "never
>> finishes"?
>>
>> (Also, putting every or every other token on a separate line does not
>> make it easier to read)
>>
>> Cheer,
>>
>> Jeff
>>
>>>
>
> --
> Mickael van der BeekWeb developer & Security analyst
>
> [email protected]
>


-- 
Mickael van der BeekWeb developer & Security analyst

[email protected]


Re: Selecting RAM and CPU based on max_connections

2022-05-20 Thread aditya desai
Thanks! I will run these suggestions with App team.

On Fri, May 20, 2022 at 4:01 PM Laurenz Albe 
wrote:

> On Fri, 2022-05-20 at 12:15 +0200, Andreas Kretschmer wrote:
> > On 20 May 2022 10:27:50 CEST, aditya desai  wrote:
> > > One of our applications needs 3000 max_connections to the database.
> > > Connection pooler like pgbouncer or pgpool is not certified within the
> > > organization yet. So they are looking for setting up high configuration
> > > Hardware with CPU and Memory. Can someone advise how much memory and
> CPU
> > > they will need if they want max_conenction value=3000.
> >
> > Pgbouncer would be the best solution. CPU: number of concurrent
> connections.
> > RAM: shared_buffer + max_connections * work_mem + maintenance_mem +
> operating system + ...
>
> Right.  And then hope and pray that a) the database doesn't get overloaded
> and b) you don't hit any of the database-internal bottlenecks caused by
> many
> connections.
>
> I also got the feeling that the Linux kernel's memory accounting somehow
> lags.
> I have seen cases where every snapshot of "pg_stat_activity" I took showed
> only a few active connections (but each time different ones), but the
> amount of allocated memory exceeded what the currently active sessions
> could
> consume.  I may have made a mistake, and I have no reproducer, but I would
> be curious to know if there is an explanation for that.
> (I am aware that "top" shows shared buffers multiple times).
>
> Yours,
> Laurenz Albe
>


Re: Selecting RAM and CPU based on max_connections

2022-05-20 Thread Ganesh Korde
You may also need to tune shmmax and shmmin kernel parameters.

Regards,
Ganesh Korde.

On Fri, 20 May 2022, 1:58 pm aditya desai,  wrote:

> Hi,
> One of our applications needs 3000 max_connections to the database.
> Connection pooler like pgbouncer or pgpool is not certified within the
> organization yet. So they are looking for setting up high configuration
> Hardware with CPU and Memory. Can someone advise how much memory and CPU
> they will need if they want max_conenction value=3000.
>
> Regards,
> Aditya.
>