Re: BUG #16968: Planner does not recognize optimization

2021-05-14 Thread David Rowley
On Fri, 14 May 2021 at 02:38, Eugen Konkov  wrote:
> Now I create minimal reproducible test case.
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341
>
> Optimization is not applyed when I filter/partition by column using composite 
> type name.

You probably already know this part, but let me explain it just in
case it's not clear.

The pushdown of the qual from the top-level query into the subquery,
or function, in this case, is only legal when the qual references a
column that's in the PARTITION BY clause of all window functions in
the subquery.  The reason for this is, if we filter rows before
calling the window function, then it could affect which rows are in
see in the window's frame. If it did filter, that could cause
incorrect results.  We can relax the restriction a bit if we can
eliminate entire partitions at once. The window function results are
independent between partitions, so we can allow qual pushdowns that
are in all PARTITION BY clauses.

As for the reason you're having trouble getting this to work, it's
down to the way you're using whole-row vars in your targetlist.

A slightly simplified case which shows this problem is:

create table ab(a int, b int);
explain select * from (select ab as wholerowvar,row_number() over
(partition by a) from ab) ab where (ab.wholerowvar).a=1;

The reason it does not work is down to how this is implemented
internally.   The details are, transformGroupClause() not assigning a
ressortgroupref to the whole-row var.  It's unable to because there is
no way to track which actual column within the whole row var is in the
partition by clause.  When it comes to the code that tries to push the
qual down into the subquery, check_output_expressions() checks if the
column in the subquery is ok to accept push downs or not. One of the
checks is to see if the query has windowing functions and to ensure
that the column is in all the PARTITION BY clauses of each windowing
function.  That check is done by checking if a ressortgroupref is
assigned and matches a tleSortGroupRef in the PARTITION BY clause.  In
this case, it does not match.  We didn't assign any ressortgroupref to
the whole-row var.

Unfortunately, whole-row vars are a bit to 2nd class citizen when it
comes to the query planner. Also, it would be quite a bit of effort to
make the planner push down the qual in this case. We'd need some sort
of ability to assign ressortgroupref to a particular column within a
whole-row var and we'd need to adjust the code to check for that when
doing subquery pushdowns to allow it to mention which columns within
whole-row vars can legally accept pushdowns.  I imagine that's
unlikely to be fixed any time soon.  Whole-row vars just don't seem to
be used commonly enough to warrant going to the effort of making this
stuff work.

To work around this, you should include a reference to the actual
column in the targetlist of the subquery, or your function, in this
case, and ensure you use that same column in the PARTITION BY clause.
You'll then need to write that column in your condition that you need
pushed into the subquery. I'm sorry if that messes up your design.
However, I imagine this is not the only optimisation that you'll miss
out on by doing things the way you are.

David




RE: Re: PostgreSQL blocked locks query

2021-05-14 Thread Manoj Kumar
Hi ,

Thank you for the reply. Is there a similar way to extract the same from a SQL 
command ?

Thanks

-Original Message-
From: Justin Pryzby 
Sent: Thursday, May 13, 2021 10:09 PM
To: Manoj Kumar 
Cc: [email protected]
Subject: [EXT MSG] Re: PostgreSQL blocked locks query

EXTERNAL source. Be CAREFUL with links / attachments

On Thu, May 13, 2021 at 01:54:32PM +, Manoj Kumar wrote:
> I have query in terms of lock monitoring in PostgreSQL where I am not able to 
> find a way to figure out what value has been passed in SQL statement (from 
> JDBC driver as prepared statement).
>
> I am using PostgreSQL 13 version.
>
> The following is the SQL statement I am running in PGAdmin
>
> The output I am getting is below, where in the SQL query is with $1.
>
> [cid:[email protected]]
>
> Is there a way to compute what is being passed as value for the above SQL 
> statement ?

You should enable query logging, and pull the params out of the log.

Note that v13 has log_parameter_max_length, which defaults to showing params in 
full.

[pryzbyj@telsasoft2019 ~]$ PGOPTIONS='-c log_min_duration_statement=0 -c 
client_min_messages=debug' python3 -c "import pg; db=pg.DB('postgres'); 
q=db.query('SELECT \$1', 1)"
DEBUG:  loaded library "auto_explain"
DEBUG:  parse : SELECT $1
LOG:  duration: 0.230 ms  parse : SELECT $1
DEBUG:  bind  to 
LOG:  duration: 0.141 ms  bind : SELECT $1
DETAIL:  parameters: $1 = '1'
LOG:  duration: 0.029 ms  execute : SELECT $1
DETAIL:  parameters: $1 = '1'

--
Justin

The information in this e-mail and any attachments is confidential and may be 
legally privileged. It is intended solely for the addressee or addressees. Any 
use or disclosure of the contents of this e-mail/attachments by a not intended 
recipient is unauthorized and may be unlawful. If you have received this e-mail 
in error please notify the sender. Please note that any views or opinions 
presented in this e-mail are solely those of the author and do not necessarily 
represent those of TEMENOS. We recommend that you check this e-mail and any 
attachments against viruses. TEMENOS accepts no liability for any damage caused 
by any malicious code or virus transmitted by this e-mail.




Very slow "bloat query"

2021-05-14 Thread Marcin Gozdalik
Hi

I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW) database.
It takes around 5 minutes for pgmetrics to run. I traced the problem to the
"bloat query" (version of
https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
doing no I/O.

I have traced the problem to the bloated `pg_class` (the irony: `pgmetrics`
does not collect bloat on `pg_catalog`):
`vacuum (full, analyze, verbose) pg_class;`
```
INFO:  vacuuming "pg_catalog.pg_class"
INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions in
158870 pages
DETAIL:  7429943 dead row versions cannot be removed yet.
CPU 1.36s/6.40u sec elapsed 9.85 sec.
INFO:  analyzing "pg_catalog.pg_class"
INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live rows
and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
VACUUM
```

`pg_class` has so many dead rows because the workload is temp-table heavy
(creating/destroying 1M+ temporary tables per day) and has long running
analytics queries running for 24h+.

PG query planner assumes that index scan on `pg_class` will be very quick
and plans Nested loop with Index scan. However, the index scan has 7M dead
tuples to filter out and the query takes more than 200 seconds (
https://explain.depesz.com/s/bw2G).

If I create a temp table from `pg_class` to contain only the live tuples:
```
CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
pg_class_alive(relname, relnamespace);
CREATE INDEX pg_class_tblspc_relfilenode_index ON
pg_class_alive(reltablespace, relfilenode);
ANALYZE pg_class_alive;
```

and run the bloat query on `pg_class_alive` instead of `pg_class`:
```
SELECT
   nn.nspname AS schemaname,
   cc.relname AS tablename,
   COALESCE(cc.reltuples,0) AS reltuples,
   COALESCE(cc.relpages,0) AS relpages,
   COALESCE(CEIL((cc.reltuples*((datahdr+8-
 (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
END))+nullhdr2+4))/(8192-20::float)),0) AS otta
 FROM
pg_class_alive cc
 JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
'information_schema'
 LEFT JOIN
 (
   SELECT
 foo.nspname,foo.relname,
 (datawidth+32)::numeric AS datahdr,
 (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
END))) AS nullhdr2
   FROM (
 SELECT
   ns.nspname, tbl.relname,
   SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
datawidth,
   MAX(coalesce(null_frac,0)) AS maxfracsum,
   23+(
 SELECT 1+count(*)/8
 FROM pg_stats s2
 WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND s2.tablename
= tbl.relname
   ) AS nullhdr
 FROM pg_attribute att
 JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
 JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
 LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
 AND s.tablename = tbl.relname
 AND s.inherited=false
 AND s.attname=att.attname
 WHERE att.attnum > 0 AND tbl.relkind='r'
 GROUP BY 1,2
   ) AS foo
 ) AS rs
 ON cc.relname = rs.relname AND nn.nspname = rs.nspname
 LEFT JOIN pg_index i ON indrelid = cc.oid
 LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
```

it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH)

The rabbit hole probably goes deeper (e.g. should do the same for
pg_statistic and pg_attribute and create a new pg_stats view).

I am not able (at least not quickly) change the amount of temporary tables
created or make the analytics queries finish quicker. Apart from the above
hack of filtering out live tuples to a separate table is there anything I
could do?

Thank you,
Marcin Gozdalik

-- 
Marcin Gozdalik


Re: Very slow "bloat query"

2021-05-14 Thread Gilles Darold

Le 14/05/2021 à 13:06, Marcin Gozdalik a écrit :

Hi

I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW) 
database. It takes around 5 minutes for pgmetrics to run. I traced the 
problem to the "bloat query" (version of 
https://wiki.postgresql.org/wiki/Show_database_bloat 
) spinning in 
CPU, doing no I/O.


I have traced the problem to the bloated `pg_class` (the irony: 
`pgmetrics` does not collect bloat on `pg_catalog`):

`vacuum (full, analyze, verbose) pg_class;`
```
INFO:  vacuuming "pg_catalog.pg_class"
INFO:  "pg_class": found 1 removable, 7430805 nonremovable row 
versions in 158870 pages

DETAIL:  7429943 dead row versions cannot be removed yet.
CPU 1.36s/6.40u sec elapsed 9.85 sec.
INFO:  analyzing "pg_catalog.pg_class"
INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live 
rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows

VACUUM
```

`pg_class` has so many dead rows because the workload is temp-table 
heavy (creating/destroying 1M+ temporary tables per day) and has long 
running analytics queries running for 24h+.


PG query planner assumes that index scan on `pg_class` will be very 
quick and plans Nested loop with Index scan. However, the index scan 
has 7M dead tuples to filter out and the query takes more than 200 
seconds (https://explain.depesz.com/s/bw2G 
).


If I create a temp table from `pg_class` to contain only the live tuples:
```
CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON 
pg_class_alive(relname, relnamespace);
CREATE INDEX pg_class_tblspc_relfilenode_index ON 
pg_class_alive(reltablespace, relfilenode);

ANALYZE pg_class_alive;
```

and run the bloat query on `pg_class_alive` instead of `pg_class`:
```
SELECT
   nn.nspname AS schemaname,
   cc.relname AS tablename,
   COALESCE(cc.reltuples,0) AS reltuples,
   COALESCE(cc.relpages,0) AS relpages,
   COALESCE(CEIL((cc.reltuples*((datahdr+8-
     (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8 
END))+nullhdr2+4))/(8192-20::float)),0) AS otta

 FROM
    pg_class_alive cc
 JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <> 
'information_schema'

 LEFT JOIN
 (
   SELECT
     foo.nspname,foo.relname,
     (datawidth+32)::numeric AS datahdr,
     (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE 
nullhdr%8 END))) AS nullhdr2

   FROM (
     SELECT
       ns.nspname, tbl.relname,
       SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS 
datawidth,

       MAX(coalesce(null_frac,0)) AS maxfracsum,
       23+(
         SELECT 1+count(*)/8
         FROM pg_stats s2
         WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND 
s2.tablename = tbl.relname

       ) AS nullhdr
     FROM pg_attribute att
     JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
     JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
     LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
     AND s.tablename = tbl.relname
     AND s.inherited=false
     AND s.attname=att.attname
     WHERE att.attnum > 0 AND tbl.relkind='r'
     GROUP BY 1,2
   ) AS foo
 ) AS rs
 ON cc.relname = rs.relname AND nn.nspname = rs.nspname
 LEFT JOIN pg_index i ON indrelid = cc.oid
 LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
```

it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH 
)


The rabbit hole probably goes deeper (e.g. should do the same for 
pg_statistic and pg_attribute and create a new pg_stats view).


I am not able (at least not quickly) change the amount of temporary 
tables created or make the analytics queries finish quicker. Apart 
from the above hack of filtering out live tuples to a separate table 
is there anything I could do?



Hi,


To avoid bloating your catalog with temporary tables you can try using 
https://github.com/darold/pgtt-rsl I don't know if it will fit the 
performances but at least you will not bloat the catalog anymore.



About your hack, I don't see other solution except running vacuum on the 
catalog tables more often, but I guess that this is already done or not 
possible. But not bloating the catalog at  such level is the right solution.



--
Gilles Darold
http://www.darold.net/






Re: Very slow "bloat query"

2021-05-14 Thread Imre Samu
>  Apart from the above hack of filtering out live tuples to a separate
table is there anything I could do?

This is the latest PG13.3 version?

IMHO:  If not,  maybe worth updating to the latest patch release, as soon
as possible

https://www.postgresql.org/docs/release/13.3/
Release date: 2021-05-13
*"Disable the vacuum_cleanup_index_scale_factor parameter and storage
option (Peter Geoghegan)*
*The notion of tracking “stale” index statistics proved to interact badly
with the autovacuum_vacuum_insert_threshold parameter, resulting in
unnecessary full-index scans and consequent degradation of autovacuum
performance. The latter mechanism seems superior, so remove the
stale-statistics logic. The control parameter for that,
vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13,
it remains present to avoid breaking existing configuration files, but it
no longer does anything."*

best,
 Imre


Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14., P,
13:20):

> Hi
>
> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW) database.
> It takes around 5 minutes for pgmetrics to run. I traced the problem to the
> "bloat query" (version of
> https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
> doing no I/O.
>
> I have traced the problem to the bloated `pg_class` (the irony:
> `pgmetrics` does not collect bloat on `pg_catalog`):
> `vacuum (full, analyze, verbose) pg_class;`
> ```
> INFO:  vacuuming "pg_catalog.pg_class"
> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions in
> 158870 pages
> DETAIL:  7429943 dead row versions cannot be removed yet.
> CPU 1.36s/6.40u sec elapsed 9.85 sec.
> INFO:  analyzing "pg_catalog.pg_class"
> INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live rows
> and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
> VACUUM
> ```
>
> `pg_class` has so many dead rows because the workload is temp-table heavy
> (creating/destroying 1M+ temporary tables per day) and has long running
> analytics queries running for 24h+.
>
> PG query planner assumes that index scan on `pg_class` will be very quick
> and plans Nested loop with Index scan. However, the index scan has 7M dead
> tuples to filter out and the query takes more than 200 seconds (
> https://explain.depesz.com/s/bw2G).
>
> If I create a temp table from `pg_class` to contain only the live tuples:
> ```
> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
> pg_class_alive(relname, relnamespace);
> CREATE INDEX pg_class_tblspc_relfilenode_index ON
> pg_class_alive(reltablespace, relfilenode);
> ANALYZE pg_class_alive;
> ```
>
> and run the bloat query on `pg_class_alive` instead of `pg_class`:
> ```
> SELECT
>nn.nspname AS schemaname,
>cc.relname AS tablename,
>COALESCE(cc.reltuples,0) AS reltuples,
>COALESCE(cc.relpages,0) AS relpages,
>COALESCE(CEIL((cc.reltuples*((datahdr+8-
>  (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
>  FROM
> pg_class_alive cc
>  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
> 'information_schema'
>  LEFT JOIN
>  (
>SELECT
>  foo.nspname,foo.relname,
>  (datawidth+32)::numeric AS datahdr,
>  (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
> END))) AS nullhdr2
>FROM (
>  SELECT
>ns.nspname, tbl.relname,
>SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
> datawidth,
>MAX(coalesce(null_frac,0)) AS maxfracsum,
>23+(
>  SELECT 1+count(*)/8
>  FROM pg_stats s2
>  WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
> s2.tablename = tbl.relname
>) AS nullhdr
>  FROM pg_attribute att
>  JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
>  JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
>  LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
>  AND s.tablename = tbl.relname
>  AND s.inherited=false
>  AND s.attname=att.attname
>  WHERE att.attnum > 0 AND tbl.relkind='r'
>  GROUP BY 1,2
>) AS foo
>  ) AS rs
>  ON cc.relname = rs.relname AND nn.nspname = rs.nspname
>  LEFT JOIN pg_index i ON indrelid = cc.oid
>  LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
> ```
>
> it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH)
>
> The rabbit hole probably goes deeper (e.g. should do the same for
> pg_statistic and pg_attribute and create a new pg_stats view).
>
> I am not able (at least not quickly) change the amount of temporary tables
> created or make the analytics queries finish quicker. Apart from the above
> hack of filtering out live tuples to a separate table is there anything I
> could do?
>
> Thank you,
> Marcin Gozdalik
>
> --
> Marcin Gozdalik
>


Re: Very slow "bloat query"

2021-05-14 Thread Tom Lane
Marcin Gozdalik  writes:
> I have traced the problem to the bloated `pg_class` (the irony: `pgmetrics`
> does not collect bloat on `pg_catalog`):
> `vacuum (full, analyze, verbose) pg_class;`
> ```
> INFO:  vacuuming "pg_catalog.pg_class"
> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions in
> 158870 pages
> DETAIL:  7429943 dead row versions cannot be removed yet.

Ugh.  It's understandable that having a lot of temp-table traffic
would result in the creation of lots of dead rows in pg_class.
The question to be asking is why aren't they vacuumable?  You
must have a longstanding open transaction somewhere (perhaps
a forgotten prepared transaction?) that is holding back the
global xmin horizon.  Closing that out and then doing another
manual VACUUM FULL should help.

regards, tom lane




Re: Very slow "bloat query"

2021-05-14 Thread Marcin Gozdalik
Unfortunately it's still 9.6. Upgrade to latest 13 is planned for this year.

pt., 14 maj 2021 o 12:08 Imre Samu  napisał(a):

> >  Apart from the above hack of filtering out live tuples to a separate
> table is there anything I could do?
>
> This is the latest PG13.3 version?
>
> IMHO:  If not,  maybe worth updating to the latest patch release, as soon
> as possible
>
> https://www.postgresql.org/docs/release/13.3/
> Release date: 2021-05-13
> *"Disable the vacuum_cleanup_index_scale_factor parameter and storage
> option (Peter Geoghegan)*
> *The notion of tracking “stale” index statistics proved to interact badly
> with the autovacuum_vacuum_insert_threshold parameter, resulting in
> unnecessary full-index scans and consequent degradation of autovacuum
> performance. The latter mechanism seems superior, so remove the
> stale-statistics logic. The control parameter for that,
> vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13,
> it remains present to avoid breaking existing configuration files, but it
> no longer does anything."*
>
> best,
>  Imre
>
>
> Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14., P,
> 13:20):
>
>> Hi
>>
>> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
>> database. It takes around 5 minutes for pgmetrics to run. I traced the
>> problem to the "bloat query" (version of
>> https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
>> doing no I/O.
>>
>> I have traced the problem to the bloated `pg_class` (the irony:
>> `pgmetrics` does not collect bloat on `pg_catalog`):
>> `vacuum (full, analyze, verbose) pg_class;`
>> ```
>> INFO:  vacuuming "pg_catalog.pg_class"
>> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions
>> in 158870 pages
>> DETAIL:  7429943 dead row versions cannot be removed yet.
>> CPU 1.36s/6.40u sec elapsed 9.85 sec.
>> INFO:  analyzing "pg_catalog.pg_class"
>> INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live
>> rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
>> VACUUM
>> ```
>>
>> `pg_class` has so many dead rows because the workload is temp-table heavy
>> (creating/destroying 1M+ temporary tables per day) and has long running
>> analytics queries running for 24h+.
>>
>> PG query planner assumes that index scan on `pg_class` will be very quick
>> and plans Nested loop with Index scan. However, the index scan has 7M dead
>> tuples to filter out and the query takes more than 200 seconds (
>> https://explain.depesz.com/s/bw2G).
>>
>> If I create a temp table from `pg_class` to contain only the live tuples:
>> ```
>> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
>> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
>> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
>> pg_class_alive(relname, relnamespace);
>> CREATE INDEX pg_class_tblspc_relfilenode_index ON
>> pg_class_alive(reltablespace, relfilenode);
>> ANALYZE pg_class_alive;
>> ```
>>
>> and run the bloat query on `pg_class_alive` instead of `pg_class`:
>> ```
>> SELECT
>>nn.nspname AS schemaname,
>>cc.relname AS tablename,
>>COALESCE(cc.reltuples,0) AS reltuples,
>>COALESCE(cc.relpages,0) AS relpages,
>>COALESCE(CEIL((cc.reltuples*((datahdr+8-
>>  (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
>> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
>>  FROM
>> pg_class_alive cc
>>  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
>> 'information_schema'
>>  LEFT JOIN
>>  (
>>SELECT
>>  foo.nspname,foo.relname,
>>  (datawidth+32)::numeric AS datahdr,
>>  (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
>> END))) AS nullhdr2
>>FROM (
>>  SELECT
>>ns.nspname, tbl.relname,
>>SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
>> datawidth,
>>MAX(coalesce(null_frac,0)) AS maxfracsum,
>>23+(
>>  SELECT 1+count(*)/8
>>  FROM pg_stats s2
>>  WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
>> s2.tablename = tbl.relname
>>) AS nullhdr
>>  FROM pg_attribute att
>>  JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
>>  JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
>>  LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
>>  AND s.tablename = tbl.relname
>>  AND s.inherited=false
>>  AND s.attname=att.attname
>>  WHERE att.attnum > 0 AND tbl.relkind='r'
>>  GROUP BY 1,2
>>) AS foo
>>  ) AS rs
>>  ON cc.relname = rs.relname AND nn.nspname = rs.nspname
>>  LEFT JOIN pg_index i ON indrelid = cc.oid
>>  LEFT JOIN pg_class_alive c2 ON c2.oid = i.indexrelid
>> ```
>>
>> it runs in 10s, 20x faster (https://explain.depesz.com/s/K4SH)
>>
>> The rabbit hole probably goes deeper (e.g. should do the same for
>> pg_statistic and pg_attribute and create a new pg_stats view).
>>
>> I am not able (at least not quickly) change the amount of temporary
>> tables created or make

Re: BUG #16968: Planner does not recognize optimization

2021-05-14 Thread KES
Thank you for detailed explanation. I glad to hear that I can use aliases and this will be recognized and optimization is applied. >We'd need some sort of ability to assign ressortgroupref to a particular column within awhole-row varCould it be possible to create hidden alias in same way as I did that manually? Algorithm seems not complex:1. User refer column from composite type/whole-row: (o).agreement_id2. Create hidden column at select: _o_agreement_id3. Replace other references to (o).agreement_id by _o_agreement_id4. Process query as usual after replacements  14.05.2021, 02:52, "David Rowley" :On Fri, 14 May 2021 at 02:38, Eugen Konkov  wrote: Now I create minimal reproducible test case. https://dbfiddle.uk/?rdbms=postgres_13&fiddle=761a00fb599789d3db31b120851d6341 Optimization is not applyed when I filter/partition by column using composite type name.You probably already know this part, but let me explain it just incase it's not clear.The pushdown of the qual from the top-level query into the subquery,or function, in this case, is only legal when the qual references acolumn that's in the PARTITION BY clause of all window functions inthe subquery. The reason for this is, if we filter rows beforecalling the window function, then it could affect which rows are insee in the window's frame. If it did filter, that could causeincorrect results. We can relax the restriction a bit if we caneliminate entire partitions at once. The window function results areindependent between partitions, so we can allow qual pushdowns thatare in all PARTITION BY clauses.As for the reason you're having trouble getting this to work, it'sdown to the way you're using whole-row vars in your targetlist.A slightly simplified case which shows this problem is:create table ab(a int, b int);explain select * from (select ab as wholerowvar,row_number() over(partition by a) from ab) ab where (ab.wholerowvar).a=1;The reason it does not work is down to how this is implementedinternally. The details are, transformGroupClause() not assigning aressortgroupref to the whole-row var. It's unable to because there isno way to track which actual column within the whole row var is in thepartition by clause. When it comes to the code that tries to push thequal down into the subquery, check_output_expressions() checks if thecolumn in the subquery is ok to accept push downs or not. One of thechecks is to see if the query has windowing functions and to ensurethat the column is in all the PARTITION BY clauses of each windowingfunction. That check is done by checking if a ressortgroupref isassigned and matches a tleSortGroupRef in the PARTITION BY clause. Inthis case, it does not match. We didn't assign any ressortgroupref tothe whole-row var.Unfortunately, whole-row vars are a bit to 2nd class citizen when itcomes to the query planner. Also, it would be quite a bit of effort tomake the planner push down the qual in this case. We'd need some sortof ability to assign ressortgroupref to a particular column within awhole-row var and we'd need to adjust the code to check for that whendoing subquery pushdowns to allow it to mention which columns withinwhole-row vars can legally accept pushdowns. I imagine that'sunlikely to be fixed any time soon. Whole-row vars just don't seem tobe used commonly enough to warrant going to the effort of making thisstuff work.To work around this, you should include a reference to the actualcolumn in the targetlist of the subquery, or your function, in thiscase, and ensure you use that same column in the PARTITION BY clause.You'll then need to write that column in your condition that you needpushed into the subquery. I'm sorry if that messes up your design.However, I imagine this is not the only optimisation that you'll missout on by doing things the way you are.David

Re: Very slow "bloat query"

2021-05-14 Thread Imre Samu
> Unfortunately it's still 9.6.

And what is your "*version()*"?


for example:
postgres=# select version();
 version

-
 PostgreSQL 9.6.22 on x86_64-pc-linux-gnu (Debian 9.6.22-1.pgdg110+1),
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
(1 row)

Imre


Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14., P,
14:11):

> Unfortunately it's still 9.6. Upgrade to latest 13 is planned for this
> year.
>
> pt., 14 maj 2021 o 12:08 Imre Samu  napisał(a):
>
>> >  Apart from the above hack of filtering out live tuples to a separate
>> table is there anything I could do?
>>
>> This is the latest PG13.3 version?
>>
>> IMHO:  If not,  maybe worth updating to the latest patch release, as soon
>> as possible
>>
>> https://www.postgresql.org/docs/release/13.3/
>> Release date: 2021-05-13
>> *"Disable the vacuum_cleanup_index_scale_factor parameter and storage
>> option (Peter Geoghegan)*
>> *The notion of tracking “stale” index statistics proved to interact badly
>> with the autovacuum_vacuum_insert_threshold parameter, resulting in
>> unnecessary full-index scans and consequent degradation of autovacuum
>> performance. The latter mechanism seems superior, so remove the
>> stale-statistics logic. The control parameter for that,
>> vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13,
>> it remains present to avoid breaking existing configuration files, but it
>> no longer does anything."*
>>
>> best,
>>  Imre
>>
>>
>> Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14., P,
>> 13:20):
>>
>>> Hi
>>>
>>> I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
>>> database. It takes around 5 minutes for pgmetrics to run. I traced the
>>> problem to the "bloat query" (version of
>>> https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
>>> doing no I/O.
>>>
>>> I have traced the problem to the bloated `pg_class` (the irony:
>>> `pgmetrics` does not collect bloat on `pg_catalog`):
>>> `vacuum (full, analyze, verbose) pg_class;`
>>> ```
>>> INFO:  vacuuming "pg_catalog.pg_class"
>>> INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions
>>> in 158870 pages
>>> DETAIL:  7429943 dead row versions cannot be removed yet.
>>> CPU 1.36s/6.40u sec elapsed 9.85 sec.
>>> INFO:  analyzing "pg_catalog.pg_class"
>>> INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live
>>> rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
>>> VACUUM
>>> ```
>>>
>>> `pg_class` has so many dead rows because the workload is temp-table
>>> heavy (creating/destroying 1M+ temporary tables per day) and has long
>>> running analytics queries running for 24h+.
>>>
>>> PG query planner assumes that index scan on `pg_class` will be very
>>> quick and plans Nested loop with Index scan. However, the index scan has 7M
>>> dead tuples to filter out and the query takes more than 200 seconds (
>>> https://explain.depesz.com/s/bw2G).
>>>
>>> If I create a temp table from `pg_class` to contain only the live tuples:
>>> ```
>>> CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
>>> CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
>>> CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
>>> pg_class_alive(relname, relnamespace);
>>> CREATE INDEX pg_class_tblspc_relfilenode_index ON
>>> pg_class_alive(reltablespace, relfilenode);
>>> ANALYZE pg_class_alive;
>>> ```
>>>
>>> and run the bloat query on `pg_class_alive` instead of `pg_class`:
>>> ```
>>> SELECT
>>>nn.nspname AS schemaname,
>>>cc.relname AS tablename,
>>>COALESCE(cc.reltuples,0) AS reltuples,
>>>COALESCE(cc.relpages,0) AS relpages,
>>>COALESCE(CEIL((cc.reltuples*((datahdr+8-
>>>  (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
>>> END))+nullhdr2+4))/(8192-20::float)),0) AS otta
>>>  FROM
>>> pg_class_alive cc
>>>  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
>>> 'information_schema'
>>>  LEFT JOIN
>>>  (
>>>SELECT
>>>  foo.nspname,foo.relname,
>>>  (datawidth+32)::numeric AS datahdr,
>>>  (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE nullhdr%8
>>> END))) AS nullhdr2
>>>FROM (
>>>  SELECT
>>>ns.nspname, tbl.relname,
>>>SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
>>> datawidth,
>>>MAX(coalesce(null_frac,0)) AS maxfracsum,
>>>23+(
>>>  SELECT 1+count(*)/8
>>>  FROM pg_stats s2
>>>  WHERE null_frac<>0 AND s2.schemaname = ns.nspname AND
>>> s2.tablename = tbl.relname
>>>) AS nullhdr
>>>  FROM pg_attribute att
>>>  JOIN pg_class_alive tbl ON att.attrelid = tbl.oid
>>>  JOIN pg_namespace ns ON ns.oid = tbl.relnamespace
>>>  LEFT JOIN pg_stats s ON s.schemaname=ns.nspname
>>>  AND s.tablename = tbl.relname
>

Re: Very slow "bloat query"

2021-05-14 Thread Marcin Gozdalik
There is a long running analytics query (which is running usually for 30-40
hours). I agree that's not the best position to be in but right now can't
do anything about it.

pt., 14 maj 2021 o 15:04 Tom Lane  napisał(a):

> Marcin Gozdalik  writes:
> > I have traced the problem to the bloated `pg_class` (the irony:
> `pgmetrics`
> > does not collect bloat on `pg_catalog`):
> > `vacuum (full, analyze, verbose) pg_class;`
> > ```
> > INFO:  vacuuming "pg_catalog.pg_class"
> > INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions
> in
> > 158870 pages
> > DETAIL:  7429943 dead row versions cannot be removed yet.
>
> Ugh.  It's understandable that having a lot of temp-table traffic
> would result in the creation of lots of dead rows in pg_class.
> The question to be asking is why aren't they vacuumable?  You
> must have a longstanding open transaction somewhere (perhaps
> a forgotten prepared transaction?) that is holding back the
> global xmin horizon.  Closing that out and then doing another
> manual VACUUM FULL should help.
>
> regards, tom lane
>


-- 
Marcin Gozdalik


Re: Very slow "bloat query"

2021-05-14 Thread Marcin Gozdalik
 PostgreSQL 9.6.21 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-44), 64-bit

pt., 14 maj 2021 o 15:45 Imre Samu  napisał(a):

> > Unfortunately it's still 9.6.
>
> And what is your "*version()*"?
>
>
> for example:
> postgres=# select version();
>  version
>
>
> -
>  PostgreSQL 9.6.22 on x86_64-pc-linux-gnu (Debian 9.6.22-1.pgdg110+1),
> compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
> (1 row)
>
> Imre
>
>
> Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14., P,
> 14:11):
>
>> Unfortunately it's still 9.6. Upgrade to latest 13 is planned for this
>> year.
>>
>> pt., 14 maj 2021 o 12:08 Imre Samu  napisał(a):
>>
>>> >  Apart from the above hack of filtering out live tuples to a separate
>>> table is there anything I could do?
>>>
>>> This is the latest PG13.3 version?
>>>
>>> IMHO:  If not,  maybe worth updating to the latest patch release, as
>>> soon as possible
>>>
>>> https://www.postgresql.org/docs/release/13.3/
>>> Release date: 2021-05-13
>>> *"Disable the vacuum_cleanup_index_scale_factor parameter and storage
>>> option (Peter Geoghegan)*
>>> *The notion of tracking “stale” index statistics proved to interact
>>> badly with the autovacuum_vacuum_insert_threshold parameter, resulting in
>>> unnecessary full-index scans and consequent degradation of autovacuum
>>> performance. The latter mechanism seems superior, so remove the
>>> stale-statistics logic. The control parameter for that,
>>> vacuum_cleanup_index_scale_factor, will be removed entirely in v14. In v13,
>>> it remains present to avoid breaking existing configuration files, but it
>>> no longer does anything."*
>>>
>>> best,
>>>  Imre
>>>
>>>
>>> Marcin Gozdalik  ezt írta (időpont: 2021. máj. 14.,
>>> P, 13:20):
>>>
 Hi

 I am trying to use `pgmetrics` on a big (10TB+), busy (1GB/s RW)
 database. It takes around 5 minutes for pgmetrics to run. I traced the
 problem to the "bloat query" (version of
 https://wiki.postgresql.org/wiki/Show_database_bloat) spinning in CPU,
 doing no I/O.

 I have traced the problem to the bloated `pg_class` (the irony:
 `pgmetrics` does not collect bloat on `pg_catalog`):
 `vacuum (full, analyze, verbose) pg_class;`
 ```
 INFO:  vacuuming "pg_catalog.pg_class"
 INFO:  "pg_class": found 1 removable, 7430805 nonremovable row versions
 in 158870 pages
 DETAIL:  7429943 dead row versions cannot be removed yet.
 CPU 1.36s/6.40u sec elapsed 9.85 sec.
 INFO:  analyzing "pg_catalog.pg_class"
 INFO:  "pg_class": scanned 6 of 158869 pages, containing 295 live
 rows and 2806547 dead rows; 295 rows in sample, 781 estimated total rows
 VACUUM
 ```

 `pg_class` has so many dead rows because the workload is temp-table
 heavy (creating/destroying 1M+ temporary tables per day) and has long
 running analytics queries running for 24h+.

 PG query planner assumes that index scan on `pg_class` will be very
 quick and plans Nested loop with Index scan. However, the index scan has 7M
 dead tuples to filter out and the query takes more than 200 seconds (
 https://explain.depesz.com/s/bw2G).

 If I create a temp table from `pg_class` to contain only the live
 tuples:
 ```
 CREATE TEMPORARY TABLE pg_class_alive AS SELECT oid,* from pg_class;
 CREATE UNIQUE INDEX pg_class_alive_oid_index ON pg_class_alive(oid);
 CREATE UNIQUE INDEX pg_class_alive_relname_nsp_index ON
 pg_class_alive(relname, relnamespace);
 CREATE INDEX pg_class_tblspc_relfilenode_index ON
 pg_class_alive(reltablespace, relfilenode);
 ANALYZE pg_class_alive;
 ```

 and run the bloat query on `pg_class_alive` instead of `pg_class`:
 ```
 SELECT
nn.nspname AS schemaname,
cc.relname AS tablename,
COALESCE(cc.reltuples,0) AS reltuples,
COALESCE(cc.relpages,0) AS relpages,
COALESCE(CEIL((cc.reltuples*((datahdr+8-
  (CASE WHEN datahdr%8=0 THEN 8 ELSE datahdr%8
 END))+nullhdr2+4))/(8192-20::float)),0) AS otta
  FROM
 pg_class_alive cc
  JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname <>
 'information_schema'
  LEFT JOIN
  (
SELECT
  foo.nspname,foo.relname,
  (datawidth+32)::numeric AS datahdr,
  (maxfracsum*(nullhdr+8-(case when nullhdr%8=0 THEN 8 ELSE
 nullhdr%8 END))) AS nullhdr2
FROM (
  SELECT
ns.nspname, tbl.relname,
SUM((1-coalesce(null_frac,0))*coalesce(avg_width, 2048)) AS
 datawidth,
MAX(coalesce(null_frac,0)) AS maxfracsum,
23+(
  SELECT 1+count(*)/8
  FROM pg_stats s2
  WHERE null_frac<>0 AND s2.schemaname = ns.nspname AN