Re: Performance issue post upgrade on Version 13 - Incorrect Estimation Cost choosing Hash Aggregate-Nested Left Loop Join

2022-03-26 Thread Tomas Vondra



On 3/22/22 13:57, Prajna Shetty wrote:
> ++ [email protected]_
> 
>  
> Hello Team,
>  
> There is change in query plan in 12.4 version and Version 13 resulting
> in performance slowness post upgrade.
>  
> 
>   * In 12.4 version, Sort Operation Group Aggregate is selected which
> results to Merge Join. Query takes ~5 seconds.
>   * In 13.5 version, optimizer wrongly estimates and due to new Disk
> Based Hash Aggregate feature, it prefers Hash Aggregate instead of
> Sort Operation which finally blocks merge-join and chooses Nested
> Loop Left Join. Query takes ~5 minutes.
> 
>  
> *_NOTE: _*Disabling Hash Aggregate on instance level forces optimizer to
> choose merge operation but such instance level modification is not
> possible in terms of Application Functionality.
>  
> This performance issue is on all over most of queries. Attached one of
> the query and its plan in both version for reference in case that helps
> for recreating the issue.
>  

It's impossible to comment those other queries, but chances are the root
cause is the same.

> Version 13 query plan has lower estimated cost than that of 12.4 which
> implies 13.5 planner thought it found a better plan, but it is running
> slower and actual cost show more.
>  
> 12.4 Version:
> "Merge Right Join  (cost=*202198.78..295729.10* rows=1 width=8) (actual
> time=1399.727..*5224.574* rows=296 loops=1)"
>  
> 13.5 version:-
> "Nested Loop Left Join  (cost=*196360.90..287890.45* rows=1 width=8)
> (actual time=3209.577..*371300.693* rows=296 loops=1)"
>  

This is not a costing issue, the problem is that we expect 1 row and
calculate the cost for that, but then get 296. And unfortunately a
nested loop degrades much faster than a merge join.

I'm not sure why exactly 12.4 picked a merge join, chances are the
costing formular changed a bit somewhere. But as I said, the problem is
in bogus row cardinality estimates - 12.4 is simply lucky.

The problem most likely stems from this part:

  ->  GroupAggregate  (cost=0.43..85743.24 rows=1830 width=72) (actual
time=1.621..3452.034 rows=282179 loops=3)
  Group Key: student_class_detail.aamc_id
  Filter: (max((student_class_detail.class_level_cd)::text) = '4'::text)
  Rows Removed by Filter: 76060
  ->  Index Scan using uk_student_class_detail_aamcid_classlevelcd on
student_class_detail  (cost=0.43..74747.61 rows=1284079 width=6) (actual
time=1.570..2723.014 rows=1272390 loops=3)
Filter: (class_level_start_dt IS NOT NULL)
Rows Removed by Filter: 160402

The filter is bound to be misestimated, and the error then snowballs.
Try replacing this part with a temporary table (with pre-aggregated
results) - you can run analyze on it, etc. I'd bet that'll make the
issue go away.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




RE: View taking time to show records

2022-03-26 Thread Kumar, Mukesh
Hi Albe , 

Thanks for the below suggestion , When I ran the query with the parameter , it 
is taking only 1 sec.

So could you please let me know if I can put this parameter to OFF . at 
database and it will not create any issues to queries running in database.
 
Could you please share some light on it.

Thanks and Regards, 
Mukesh Kumar

-Original Message-
From: Laurenz Albe  
Sent: Friday, March 25, 2022 4:13 PM
To: Kumar, Mukesh ; [email protected]
Subject: Re: View taking time to show records

On Thu, 2022-03-24 at 15:59 +, Kumar, Mukesh wrote:
> We have recently migrated from Oracle to PostgreSQL on version 11.4 on azure 
> postgres PaaS instance.
>  
> There is 1 query which is taking approx. 10 secs in Oracle and when we 
> ran the same query it is taking approx. 1 min
>  
> Can anyone suggest to improve the query as from application end 1 min time is 
> not accepted by client.
>  
> Please find the query and explain analyze report from below link
>  
> https://urldefense.com/v3/__https://explain.depesz.com/s/RLJn*stats__;
> Iw!!KupS4sW4BlfImQPd!Ln-8-n9OcKKifiwKjYcs_JOUo80VTTp5hA9V_-gYjOfDr3DDm
> psmbIY_MQxw5RwQ2ZQtMlobbmvex2CIaJtISv0ZkaSn5w$

I would split the query in two parts: the one from line 3 to line 49 of your 
execution plan, and the rest.  The problem is the bad estimate of that first 
part, so execute only that, write the result to a temporary table and ANALYZE 
that.  Then execute the rest of the query using that temporary table.

Perhaps it is also enough to blindly disable nested loop joins for the whole 
query, rather than doing the right thing and fixing the estimates:

BEGIN;
SET LOCAL enable_nestloop = off;
SELECT ...;
COMMIT;

Yours,
Laurenz Albe
--
Cybertec | 
https://urldefense.com/v3/__https://www.cybertec-postgresql.com__;!!KupS4sW4BlfImQPd!Ln-8-n9OcKKifiwKjYcs_JOUo80VTTp5hA9V_-gYjOfDr3DDmpsmbIY_MQxw5RwQ2ZQtMlobbmvex2CIaJtISv1qNNoktA$