PostgreSQL statistics

2023-11-17 Thread Paulo Silva
Hi,

I have a table in a PostgreSQL 11 server that isn't being used a lot.

If I run a manual ANALYSE on it the last_analyse statistic isn't being
update (I've waited enough time for it to be updated):

# SELECT * FROM pg_catalog.pg_stat_user_tables WHERE schemaname='s1' and
relname='t1';
   relid   |  schemaname  |  relname  | seq_scan |
seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
| n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze |
last_vacuum |last_autovacuum| last_analyze |
last_autoanalyze | vacuum_count | autovacuum_count | analyze_count |
autoanalyze_count
---+--+---+--+--+--+---+---+---+---+---+++-+-+---+--+--+--+--+---+---
 237177743 | s1 | t1 |0 |0 |0 | 0 |
0 | 0 | 0 | 0 |   4820 |  0
|   0 | | 2023-11-03 13:34:23.725802+00 |
   |  |0 |1 | 0
| 0
(1 row)

But if I run a "SELECT * FROM s1.t1"  I see the seq_scan increase to 1 (as
expected) and after that I can issue a manual ANALYZE and the last_analyze
gets updated.

Is this behaviour normal? Are there some parameters that are verified that
prevent the ANALYSE from running?

Regards
-- 
Paulo Silva 


Re: PostgreSQL statistics

2023-11-17 Thread Adrian Klaver

On 11/17/23 02:25, Paulo Silva wrote:

Hi,

I have a table in a PostgreSQL 11 server that isn't being used a lot.

If I run a manual ANALYSE on it the last_analyse statistic isn't being 
update (I've waited enough time for it to be updated):


# SELECT * FROM pg_catalog.pg_stat_user_tables WHERE schemaname='s1' and 
relname='t1';
    relid   |      schemaname      |      relname      | seq_scan | 
seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | 
n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | 
n_mod_since_analyze | last_vacuum |        last_autovacuum        | 
last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | 
analyze_count | autoanalyze_count

---+--+---+--+--+--+---+---+---+---+---+++-+-+---+--+--+--+--+---+---
  237177743 | s1 | t1 |        0 |            0 |        0 | 
0 |         0 |         0 |         0 |             0 |       4820 | 
      0 |                   0 |             | 2023-11-03 
13:34:23.725802+00 |              |                  |            0 |   
              1 |             0 |                 0

(1 row)

But if I run a "SELECT * FROM s1.t1"  I see the seq_scan increase to 1 
(as expected) and after that I can issue a manual ANALYZE and the 
last_analyze gets updated.


Is this behaviour normal? Are there some parameters that are verified 
that prevent the ANALYSE from running?


The oldest Postgres instance I have access to is version 12 and I see 
the update to pg_stat_user_tables when I run ANALYZE in psql in autocommit.


Are you maybe seeing the effects of this?:

https://www.postgresql.org/docs/11/monitoring-stats.html

"Another important point is that when a server process is asked to 
display any of these statistics, it first fetches the most recent report 
emitted by the collector process and then continues to use this snapshot 
for all statistical views and functions until the end of its current 
transaction. So the statistics will show static information as long as 
you continue the current transaction. Similarly, information about the 
current queries of all sessions is collected when any such information 
is first requested within a transaction, and the same information will 
be displayed throughout the transaction."



As in:

begin;

select relname, last_analyze from pg_stat_user_tables where relname = 
'cell_per';

-[ RECORD 1 ]+--
relname  | cell_per
last_analyze | 2023-11-17 08:11:18.614987-08

analyze cell_per ;
ANALYZE

select relname, last_analyze from pg_stat_user_tables where relname = 
'cell_per';

-[ RECORD 1 ]+--
relname  | cell_per
last_analyze | 2023-11-17 08:11:18.614987-08

commit;

select relname, last_analyze from pg_stat_user_tables where relname = 
'cell_per';

-[ RECORD 1 ]+--
relname  | cell_per
last_analyze | 2023-11-17 08:13:02.969537-08




Regards
--
Paulo Silva mailto:paulo...@gmail.com>>


--
Adrian Klaver
adrian.kla...@aklaver.com





Unused CTE affects result set

2023-11-17 Thread Gabriel Sánchez
Dear Postgres community,

I'm seeing unintuitive behavior with a query structured as follows:

WITH used_cte (SELECT a, b, c FROM t1 ...)
, unused_cte (SELECT d, e, f FROM t2 ...)
SELECT * FROM used_cte ...
-- returns no results

WITH used_cte (SELECT a, b, c FROM t1 ...)
-- , unused_cte (SELECT d, e, f FROM t2 ...)
SELECT * FROM used_cte ...
-- returns results

I don't understand why the presence of a CTE that ends up not being
included in the final query affects the result set.  Could this be a bug,
or is there something I don't understand?

PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu,
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit

Thanks,
Gabriel


Re: Unused CTE affects result set

2023-11-17 Thread Adrian Klaver


On 11/17/23 10:30 AM, Gabriel Sánchez wrote:

Dear Postgres community,

I'm seeing unintuitive behavior with a query structured as follows:

WITH used_cte (SELECT a, b, c FROM t1 ...)
, unused_cte (SELECT d, e, f FROM t2 ...)
SELECT * FROM used_cte ...
-- returns no results

WITH used_cte (SELECT a, b, c FROM t1 ...)
-- , unused_cte (SELECT d, e, f FROM t2 ...)
SELECT * FROM used_cte ...
-- returns results

I don't understand why the presence of a CTE that ends up not being 
included in the final query affects the result set. Could this be a 
bug, or is there something I don't understand?



I would say it has to do with what '...' is doing?

In other words I don't see this behavior here for a simple case.



PostgreSQL 14.9 (Ubuntu 14.9-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit


Thanks,
Gabriel



Re: Unused CTE affects result set

2023-11-17 Thread Tom Lane
Adrian Klaver  writes:
> On 11/17/23 10:30 AM, Gabriel Sánchez wrote:
>> I don't understand why the presence of a CTE that ends up not being 
>> included in the final query affects the result set. Could this be a 
>> bug, or is there something I don't understand?

> I would say it has to do with what '...' is doing?

We'd need to see a complete example to do more than speculate.
The planner definitely does skip unreferenced SELECT CTEs, though.

regards, tom lane




Re: Trigger to Count Number of Logical Replication Table Changes.

2023-11-17 Thread Juan Rodrigo Alejandro Burgos Mella
Hi
The only thing that can be established is that if the record does not
exist, the trigger will not be updating anything in the table
"tst.time_audit_tbl" for the condition "table_name =
CONCAT(TG_TABLE_SCHEMA, '.', TG_TABLE_NAME)"

Maybe, checking before the UPDATE, if the record does not exist, add it.

IF NOT EXISTS (SELECT 1 FROM tst.time_audit_tbl WHERE table_name =
CONCAT(TG_TABLE_SCHEMA, '.', TG_TABLE_NAME))
THEN
INSERT INTO FROM tst.time_audit_tbl (table_name) VALUES
(CONCAT(TG_TABLE_SCHEMA, '.', TG_TABLE_NAME));
END IF

Atte.
JRBM



El jue, 16 nov 2023 a las 14:19, Avi Weinberg () escribió:

> I'm using Postgres (13 and 15) logical replication to sync data from two
> servers. I would like to have an update counter whenever data is changed.
> The counter can be incremented by 1 even if multiple rows are updated, but
> it is also ok to be incremented the counter by the number of rows updated
> (but it seems less efficient to me).
> I need the counter to increase after initial sync as well as after regular
> logical replication sync.
>
> Triggers not to work without ENABLE ALWAYS.
>
> In addition, If I try trigger that is "FOR EACH STATEMENT" it works only
> for initial sync and not for regular logical replication sync.
>
> Having per row set_time_trig  trigger takes about 1 minute when updating
> 50k rows in one transaction (all I need is to increase update_count by 1,
> why spend 1 minute for it) . How can I improve this?
>
> CREATE TABLE IF NOT EXISTS tst.t2
>
> (
>
> id bigint NOT NULL,
>
> c1 int,
>
> CONSTRAINT pk_t2 PRIMARY KEY (id)
>
> );
>
>
>
> CREATE TABLE IF NOT EXISTS tst.time_audit_tbl
>
> (
>
> table_name character varying(63) COLLATE pg_catalog."default" NOT NULL,
>
> update_count integer DEFAULT 0,
>
> CONSTRAINT updated_time_audit_unique UNIQUE (table_name)
>
> );
>
>
>
>
>
> CREATE FUNCTION tst.set_time() RETURNS trigger
>
> LANGUAGE plpgsql SECURITY DEFINER
>
> AS $$
>
> DECLARE
>
>   updated_count int;
>
> BEGIN
>
> UPDATE tst.time_audit_tbl SET update_count = update_count + 1
> WHERE table_name = CONCAT(TG_TABLE_SCHEMA, '.', TG_TABLE_NAME);
>
> GET DIAGNOSTICS updated_count = ROW_COUNT;
>
> IF updated_count = 0 THEN
>
>   RAISE EXCEPTION 'set_updated_time().  Table not found %.%',
> TG_TABLE_SCHEMA, TG_TABLE_NAME;
>
> END IF;
>
>
>
> RETURN coalesce(NEW, OLD);
>
> END;
>
> $$;
>
>
>
>
>
> CREATE  TRIGGER set_time_trig
>
> AFTER INSERT OR DELETE OR UPDATE
>
> ON tst.t2
>
> FOR EACH ROW
>
> EXECUTE FUNCTION tst.set_time();
>
>
>
> ALTER TABLE tst.t2 ENABLE ALWAYS TRIGGER set_time_trig;
>
>
> IMPORTANT - This email and any attachments is intended for the above named
> addressee(s), and may contain information which is confidential or
> privileged. If you are not the intended recipient, please inform the sender
> immediately and delete this email: you should not copy or use this e-mail
> for any purpose nor disclose its contents to any person.
>


query plan

2023-11-17 Thread Torsten Förtsch
Hi,

This is part of a query plan:

 Nested Loop Left Join  (cost=26.32..47078866.36 rows=1344945195 width=626)
   ->  Nested Loop Left Join  (cost=25.74..5312.48 rows=1344945195
width=608)
 ->  Nested Loop Left Join  (cost=6.79..2876.77 rows=102 width=373)
   ->  Nested Loop Left Join  (cost=1.90..1965.51 rows=102
width=361)
   ->  Bitmap Heap Scan on ...  (cost=4.89..8.91 rows=2
width=28)
 ->  Hash Left Join  (cost=18.95..42.61 rows=3 width=243)
   ->  Hash Left Join  (cost=18.94..42.59 rows=3 width=203)
   ->  Hash  (cost=0.00..0.00 rows=1 width=48)
   ->  Memoize  (cost=0.58..4.59 rows=1 width=172)

What I don't understand is this. The left node of the join is expected to
return 102 rows. The right node 3. How can this result in >1e9 rows?

The query involved way further down a partitioned table with 2 partitions,
one pretty big in the 1e9 rows range, the other practically empty. The big
partition had been analyzed before. But the partitioned table and the empty
partition never. After analyzing them all was well.

I am just curious to understand how that number is calculated.

This is PG14.

Thanks.


Re: query plan

2023-11-17 Thread Tom Lane
=?UTF-8?Q?Torsten_F=C3=B6rtsch?=  writes:
> This is part of a query plan:

>  Nested Loop Left Join  (cost=26.32..47078866.36 rows=1344945195 width=626)
>->  Nested Loop Left Join  (cost=25.74..5312.48 rows=1344945195
> width=608)
>  ->  Nested Loop Left Join  (cost=6.79..2876.77 rows=102 width=373)
>->  Nested Loop Left Join  (cost=1.90..1965.51 rows=102
> width=361)
>->  Bitmap Heap Scan on ...  (cost=4.89..8.91 rows=2
> width=28)
>  ->  Hash Left Join  (cost=18.95..42.61 rows=3 width=243)
>->  Hash Left Join  (cost=18.94..42.59 rows=3 width=203)
>->  Hash  (cost=0.00..0.00 rows=1 width=48)
>->  Memoize  (cost=0.58..4.59 rows=1 width=172)

> What I don't understand is this. The left node of the join is expected to
> return 102 rows. The right node 3. How can this result in >1e9 rows?

The rowcount estimate for the join is not derived by multiplying the
rowcount estimates of the two inputs.  Having said that, this looks
pretty inconsistent.  Can you show a test case that acts like that?

regards, tom lane