Re: Using system tables directly takes many hours, using temp tables with no indexes takes a few seconds for geometry_columns view.

2022-03-24 Thread Lars Aksel Opsahl

From: Justin Pryzby 
Sent: Wednesday, March 23, 2022 2:19 PM

>On Wed, Mar 23, 2022 at 09:44:09AM +, Lars Aksel Opsahl wrote:
>> Why is temp tables with no indexes much faster system tables with indexes ?
>
>I think the "temp table" way is accidentally faster due to having no
>statistics, not because it has no indexes.  If you run ANALYZE, you may hit the
>same issue (or, maybe you just need to VACUUM ANALYZE your system catalogs).

Hi

Sorry I misread your mail you are totally right.

Before I do vacuum we have these execution Time: 9422.964 ms (00:09.423)

The vacuum as you suggested
VACUUM ANALYZE pg_attribute_temp;
VACUUM ANALYZE pg_namespace_temp;
VACUUM ANALYZE pg_type_temp;
VACUUM ANALYZE pg_constraint_temp;

I can wait for 10 minutes and it just hangs, yes so we have the same problem as 
suggested.

The original query "select * from geometry_columns" finally finished after 
almost 9 hours .

The plan is here https://explain.depesz.com/s/jGXf

I did some more testing and if remove LEFT JOIN to pg_constraint in runs in 
less than a minute and return  75219 rows.

WITH geo_column_list AS (SELECT
current_database()::character varying(256) AS f_table_catalog,
n.nspname AS f_table_schema,
n.oid AS n_oid,
c.relname AS f_table_name,
c.oid AS c_oid,
a.attname AS f_geometry_column,
a.attnum AS a_attnum
--COALESCE(postgis_typmod_dims(a.atttypmod), sn.ndims, 2) AS 
coord_dimension,
--COALESCE(NULLIF(postgis_typmod_srid(a.atttypmod), 0), sr.srid, 0) AS srid,
--replace(replace(COALESCE(NULLIF(upper(postgis_typmod_type(a.atttypmod)), 
'GEOMETRY'::text), st.type, 'GEOMETRY'::text), 'ZM'::text, ''::text), 
'Z'::text, ''::text)::character varying(30) AS type
   FROM pg_class c
 JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped
 JOIN pg_namespace n ON c.relnamespace = n.oid
 JOIN pg_type t ON a.atttypid = t.oid
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'm'::"char", 
'f'::"char", 'p'::"char"]))
  AND NOT c.relname = 'raster_columns'::name AND t.typname = 'geometry'::name
  AND NOT pg_is_other_temp_schema(c.relnamespace)
  AND has_table_privilege(c.oid, 'SELECT'::text)
)
SELECT * FROM geo_column_list;

But if I try this with LEFT JOIN it hangs for hours it seems like.

WITH geo_column_list AS (SELECT
current_database()::character varying(256) AS f_table_catalog,
n.nspname AS f_table_schema,
n.oid AS n_oid,
c.relname AS f_table_name,
c.oid AS c_oid,
a.attname AS f_geometry_column,
a.attnum AS a_attnum,
a.atttypmod
--COALESCE(NULLIF(postgis_typmod_srid(a.atttypmod), 0), sr.srid, 0) AS srid,
--replace(replace(COALESCE(NULLIF(upper(postgis_typmod_type(a.atttypmod)), 
'GEOMETRY'::text), st.type, 'GEOMETRY'::text), 'ZM'::text, ''::text), 
'Z'::text, ''::text)::character varying(30) AS type
   FROM pg_class c
 JOIN pg_attribute a ON a.attrelid = c.oid AND NOT a.attisdropped
 JOIN pg_namespace n ON c.relnamespace = n.oid
 JOIN pg_type t ON a.atttypid = t.oid
  WHERE (c.relkind = ANY (ARRAY['r'::"char", 'v'::"char", 'm'::"char", 
'f'::"char", 'p'::"char"]))
  AND NOT c.relname = 'raster_columns'::name AND t.typname = 'geometry'::name
  AND NOT pg_is_other_temp_schema(c.relnamespace)
  AND has_table_privilege(c.oid, 'SELECT'::text)
),
pg_constraint_list AS (SELECT pg_constraint.connamespace,
pg_constraint.conrelid,
pg_constraint.conkey,
pg_get_constraintdef(pg_constraint.oid) AS consrc
   FROM pg_constraint, geo_column_list
WHERE connamespace = n_oid AND conrelid = c_oid AND (a_attnum = ANY (conkey))
)
,
geo_column_list_full AS (SELECT * FROM geo_column_list
 LEFT JOIN ( SELECT s.connamespace,
s.conrelid,
s.conkey,
replace(split_part(s.consrc, ::text, 2), ')'::text, ''::text) 
AS type
   FROM pg_constraint_list s
  WHERE s.consrc ~~* '%geometrytype(% = %'::text) st ON TRUE
 LEFT JOIN ( SELECT s.connamespace,
s.conrelid,
s.conkey,
replace(split_part(s.consrc, ' = '::text, 2), ')'::text, 
''::text)::integer AS ndims
   FROM pg_constraint_list s
  WHERE s.consrc ~~* '%ndims(% = %'::text) sn ON TRUE
 LEFT JOIN ( SELECT s.connamespace,
s.conrelid,
s.conkey,
replace(replace(split_part(s.consrc, ' = '::text, 2), ')'::text, 
''::text), '('::text, ''::text)::integer AS srid
   FROM pg_constraint_list s
  WHERE s.consrc ~~* '%srid(% = %'::text) sr ON TRUE
)
SELECT *,
COALESCE(postgis_typmod_dims(atttypmod), ndims, 2) AS coord_dimension
FROM geo_column_list_full;

but if I try this it return 648 rows in less than second

WITH geo_column_list AS (SELECT
current_database()::character varying(256) AS f_table_catalog,
n.nspname AS f_table_schema,
n.oid AS n_oid,
c.relname AS f_table_name,
c.oid AS c_oid,
a.attname A

Performance issue post upgrade on Version 13 - Incorrect Estimation Cost choosing Hash Aggregate-Nested Left Loop Join

2022-03-24 Thread Prajna Shetty
++ 
[email protected]

Hello Team,

There is change in query plan in 12.4 version and Version 13 resulting in 
performance slowness post upgrade.

*   In 12.4 version, Sort Operation Group Aggregate is selected which 
results to Merge Join. Query takes ~5 seconds.
*   In 13.5 version, optimizer wrongly estimates and due to new Disk Based 
Hash Aggregate feature, it prefers Hash Aggregate instead of Sort Operation 
which finally blocks merge-join and chooses Nested Loop Left Join. Query takes 
~5 minutes.

  NOTE: Disabling Hash Aggregate on instance level forces optimizer to 
choose merge operation but such instance level modification is not possible in 
terms of Application Functionality.

This performance issue is on all over most of queries. Attached one of the 
query and its plan in both version for reference in case that helps for 
recreating the issue.

Version 13 query plan has lower estimated cost than that of 12.4 which implies 
13.5 planner thought it found a better plan, but it is running slower and 
actual cost show more.

12.4 Version:
"Merge Right Join  (cost=202198.78..295729.10 rows=1 width=8) (actual 
time=1399.727..5224.574 rows=296 loops=1)"

13.5 version:-
"Nested Loop Left Join  (cost=196360.90..287890.45 rows=1 width=8) (actual 
time=3209.577..371300.693 rows=296 loops=1)"


Details:-
1.  It is AWS Aurora-Postgresql RDS instance. We have raised case with AWS 
and since this issue is a regression coming from the community PostgreSQL code, 
we would like to raise bug here.
2.  We were upgrading from 12.4 version to (13.4 and later)
3.  vCPU: 2 , RAM: 8 GB
4.  Attached Stats for all tables in this schema for your reference.

5.  Attached is metadata for one of the table person for your reference.

We have performed many such below steps, but it did not help:-

1.  We have performed Vacuum/Analyze/Reindex post Upgrade.
2.  Tweaked work_mem so it does not spill to Disk. We can Disk Usage But it 
is still using Hash Aggregate and came down from 5 minutes to 20 seconds. 
(Expected ~5 seconds). Attached plan after modifying work_mem

3.  Disabled Seqcan/ nestedloop
4.  Tweaked random_page_cost/seq_page_cost
5.  Set default_statistics_target=1000 and then run vacuum(analyze,verbose) 
on selected tables.
6.  We have also tested performance by increasing resources up to 4 vCPU 
and 32 GB RAM.

Could you please check and confirm if this incorrect Cost Estimation is known 
concern in Version 13 where in some cases optimizer calculates and prefers Hash 
Aggregate==>Nested Left Loop Join instead of Merge Join?



Thanks & Regards,

Prajna Shetty
Technical Specialist,
Data Platform Support & Delivery





-Original Message-
From: Tom Lane 
Sent: Monday, March 21, 2022 7:29 PM
To: Prajna Shetty 
Cc: [email protected]; Beenu Sharma 
Subject: Re: Performance issue post upgrade on Version 13 - Incorrect 
Estimation Cost choosing Hash Aggregate-Nested Left Loop Join

* This e-mail originated outside of Mindtree. Exercise caution before clicking 
links or opening attachments *

Prajna Shetty mailto:[email protected]>> 
writes:
> There is change in query plan in 12.4 version and Version 13 resulting in 
> performance slowness post upgrade.

Standard upgrade methods don't transfer statistics from the old version, so the 
first question to ask is have you ANALYZE'd the relevant tables since upgrading?

If you have, then to offer useful help with this we'll need to see all the 
details described in

https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.postgresql.org%2Fwiki%2FSlow_Query_Questions&data=04%7C01%7CPrajna.Shetty%40mindtree.com%7C5ca04f6fdd7b452f51f508da0b42fc8e%7C85c997b9f49446b3a11d772983cf6f11%7C0%7C0%7C637834679772208865%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=sx8OsD%2FpfdcSHV%2FUsm4Vtm7tadbZIugLFZaXfD7X%2BZc%3D&reserved=0

In any case, this is unlikely to be a bug.  The pgsql-performance list would be 
a more suitable place to discuss it.

regards, tom lane



  

http://www.mindtree.com/email/disclaimer.html


srs.sql
Description: srs.sql
"Nested Loop Left Join  (cost=196360.90..287890.45 rows=1 width=8) (actual 
time=3209.577..371300.693 rows=296 loops=1)"
"  Join Filter: (s.aamc_id = ssd.aamc_id)"
"  Rows Removed by Join Filter: 60348903"
"  ->  Nested Loop  (cost=15232.70..101055.49 rows=1 width=12) (actual 
time=470.584..2428.982 rows=294 loops=1)"
"->  Nested Loop  (cost=15232.56..101055.33 rows=1 width=17) (actual 
time=470.552..2424.584 rows=294 loops=1)"
"  ->  Nested Loop  (cost=15232.43..101055.17 rows=1 width=19) 
(actual time=470.532..2420.321 rows=294 loops=1)"
"->  Gather Merge  (cost=15232.29..101055.01 rows=1 
width=20) (actual time=470.485..2411.475 r

Re: Performance issue post upgrade on Version 13 - Incorrect Estimation Cost choosing Hash Aggregate-Nested Left Loop Join

2022-03-24 Thread Justin Pryzby
On Tue, Mar 22, 2022 at 12:57:10PM +, Prajna Shetty wrote:
> 1.  We have performed Vacuum/Analyze/Reindex post Upgrade.
> 2.  Tweaked work_mem so it does not spill to Disk. We can Disk Usage But 
> it is still using Hash Aggregate and came down from 5 minutes to 20 seconds. 
> (Expected ~5 seconds). Attached plan after modifying work_mem
> 3.  Disabled Seqcan/ nestedloop
> 4.  Tweaked random_page_cost/seq_page_cost
> 5.  Set default_statistics_target=1000 and then run 
> vacuum(analyze,verbose) on selected tables.
> 6.  We have also tested performance by increasing resources up to 4 vCPU 
> and 32 GB RAM.

Would you provide your current settings ?
https://wiki.postgresql.org/wiki/Server_Configuration