dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
Hi All,

I'm trying to understand why there's a difference between what pgstattuple
reports and pg_stat_user_tables reports (for the number of dead tuples).

As I understand, pgstattuple and pgstattuple_approx return the exact number
of dead tuples (as noted in the documentation) and based on an older Stack
Overflow answer the value returned from pg_stat_user_tables "uses the most
recent data collected by ANALYZE".

Why would it be that even after analyzing a table the n_dead_tup value is
still vastly different than dead_tuple_count?

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 3736
n_dead_tup   | 1127044
last_autovacuum  | 2024-08-23 16:00:30.983141+00
last_analyze | 2024-08-23 15:33:50.628422+00
now  | 2024-08-23 16:01:19.915893+00
autovacuum_count | 446478
SELECT 1

> vacuum (verbose,analyze) oban.oban_jobs;

vacuuming "oban.oban_jobs"
table "oban_jobs": index scan bypassed: 29341 pages from table (0.79% of
total) have 747 dead item identifiers
launched 2 parallel vacuum workers for index cleanup (planned: 2)
index "oban_jobs_args_index" now contains 18281 row versions in 10232 pages
0 index row versions were removed.
0 index pages were newly deleted.
56 index pages are currently deleted, of which 833 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
index "oban_jobs_meta_index" now contains 18281 row versions in 9698 pages
0 index row versions were removed.
0 index pages were newly deleted.
35 index pages are currently deleted, of which 621 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
table "oban_jobs": found 855 removable, 9661 nonremovable row versions in
29341 out of 3727204 pages
1330 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 3696951 frozen pages.
912 skipped pages using mintxid fork.
CPU: user: 0.12 s, system: 0.08 s, elapsed: 0.22 s.
vacuuming "pg_toast.pg_toast_72454950"
table "pg_toast_72454950": found 0 removable, 0 nonremovable row versions
in 0 out of 0 pages
0 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 skipped pages using mintxid fork.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
analyzing "oban.oban_jobs"
"oban_jobs": scanned 3 of 3727204 pages, containing 75 live rows and
10501 dead rows; 75 rows in sample, 9318 estimated total rows
VACUUM

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 1701
n_dead_tup   | 1306009
last_autovacuum  | 2024-08-23 16:01:31.034229+00
last_analyze | 2024-08-23 16:01:47.85574+00
now  | 2024-08-23 16:01:55.734589+00
autovacuum_count | 446479

This is a Google Alloy DB instance running:
> select version();
-[ RECORD 1 ]-
version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian clang
version 12.0.1, 64-bit
SELECT 1


Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
wrote:

> On 8/23/24 09:14, Matthew Tice wrote:
> > Hi All,
> >
> > I'm trying to understand why there's a difference between what
> > pgstattuple reports and pg_stat_user_tables reports (for the number of
> > dead tuples).
> >
> > As I understand, pgstattuple and pgstattuple_approx return the exact
> > number of dead tuples (as noted in the documentation) and based on an
>
> https://www.postgresql.org/docs/current/pgstattuple.html
>
> pgstattuple_approx(regclass) returns record
>
>  pgstattuple_approx is a faster alternative to pgstattuple that
> returns approximate results.
>
> Not sure how you get exact count out of that?
>

Maybe the wording is a little confusing to me. Under the section
for pgstattuple_approx:
"pgstattuple_approx tries to avoid the full-table scan and returns exact
dead tuple statistics along with an approximation of the number and size of
live tuples and free space."


>
> > This is a Google Alloy DB instance running:
>
> https://cloud.google.com/alloydb/docs/overview
>
> "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible
> database service that's designed for your most demanding workloads,
> including hybrid transactional and analytical processing. AlloyDB pairs
> a Google-built database engine with a cloud-based, multi-node
> architecture to deliver enterprise-grade performance, reliability, and
> availability."
>
> Where the important parts are 'PostgreSQL-compatible' and 'Google-built
> database engine'. You probably need to reach out to Google to see what
> that means for this situation.
>
> Got it, thanks Adrian.


>
> >  > select version();
> > -[ RECORD 1 ]-
> > version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian
> > clang version 12.0.1, 64-bit
> > SELECT 1
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


datfrozenxid not dropping after vacuum

2021-09-01 Thread Matthew Tice
Hi,

Starting this morning at 0830 local time I noticed that my
datfrozenxid starts moving past the `autovacuum_freeze_max_age` value
of 2.   When we encountered this in the past the solution has
been to do one of the following:

1. This is related an error similar to
```
found xmin 2675436435 from before relfrozenxid 321165377
```
Where the solution has been to move the `pg_internal.init` file out of
the way and let Postgresql recreate it.  Or;

2. A long-running transaction.  Typically I'll just find the `idle in
transaction` transactions that have a `query_start` around when my
alarm went off notifying me when `datfrozenxid` breaches
`autovacuum_freeze_max_age`.  Using a query similar to
```
SELECT pid, query_start, datname, usename, state, backend_xmin,
age(backend_xmin)
FROM pg_stat_activity
WHERE state = 'idle in transaction';
```

3. The autovacuum process seemed to be "stuck" on a particular table.
We would kill the pid of the autovacuum process.

The problem is that neither of these solutions have seemed to drop
`datfrozenxid` back down and there is one specific database in this
cluster that's holding onto it.

Using these queries from CrunchyData:

# Show oldest current xid
# WITH max_age AS (
SELECT 20 as max_old_xid
, setting AS autovacuum_freeze_max_age
FROM pg_catalog.pg_settings
WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
SELECT datname
, m.max_old_xid::int
, m.autovacuum_freeze_max_age::int
, age(d.datfrozenxid) AS oldest_current_xid
FROM pg_catalog.pg_database d
JOIN max_age m ON (true)
WHERE d.datallowconn )
SELECT max(oldest_current_xid) AS oldest_current_xid
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS
percent_towards_wraparound
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float)))
AS percent_towards_emergency_autovac
FROM per_database_stats;

 oldest_current_xid | percent_towards_wraparound |
percent_towards_emergency_autovac
++---
  230935699 | 12 |
  115

# Top 8 individual databases
SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC LIMIT 8;

  datname  |age| current_setting
---+---+-
 siteservice  | 230935699 | 2
 coupon-ws   | 217613246 | 2
 contextchangestore | 211343280 | 2
 template0 | 210351639 | 2
 productmanager | 207876167 | 2
 rhistory  | 207876167 | 2
 smsservice   | 207876167 | 2
 techservice  | 205432524 | 2

That top database `siteservice` is the "problematic" one where a
manual vacuum won't coerce it to free up the `datfrozenxid`.

Looking at the tables in that database:

# SELECT c.oid::regclass
, age(c.relfrozenxid)
, pg_size_pretty(pg_total_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n on c.relnamespace = n.oid
WHERE relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast')
ORDER BY 2 DESC LIMIT 10;

   oid |age| pg_size_pretty
+---+
 pg_database| 230935699 | 4264 kB
 pg_proc|  93543215 | 976 kB
 pg_collation   |  93543215 | 560 kB
 pg_attribute   |  93543215 | 600 kB
 pg_shdepend|  59515320 | 15 MB
 pg_statistic   |  53828900 | 464 kB
 pg_subscription|  53172718 | 16 kB
 pg_pltemplate  |  53172718 | 56 kB
 pg_authid  |  53172718 | 8616 kB
 pg_db_role_setting |  53172718 | 64 kB

I thought maybe it had to do with my replication slots somehow:

# select slot_name, slot_type, database, active, catalog_xmin,
restart_lsn, confirmed_flush_lsn from pg_replication_slots ;
   slot_name| slot_type |
database| active | catalog_xmin |  restart_lsn  |
confirmed_flush_lsn
+---+---++--+---+-
 dbs1db02   | physical  |
 | f  |  |   |
 dbs1db01   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 dbs1db03   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 dbs2db01   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 debezium_cmanager  | logical   | campaign-manager
 | t  |   2152258063 | 4D25/A421A6C8 | 4D25/ABC18C88
 debezium_rservice| logical   | retail-content-service
| t  |   2152238060 | 4D25/8EC403B0 | 4D25/A6105DF8
 debezium_partnerservice | logical   | partnerservice | t  |
2152238060 | 4D25/8EC403B0 | 4D25/A5446630
 

Re: datfrozenxid not dropping after vacuum

2021-09-01 Thread Matthew Tice
Hi Alvaro, thanks for the quick reply.

I'm scheduled to do my patching maintenance at the end of this month -
but at this point I don't think I'm going to make it.

Other than patching, is there a work around?  For example, in #2 above:
>The fix for 2) is simpler,
>simply always remove both the shared and local init files.

I'm not familiar with the differences between 'shared' and 'local'
init files (I'd imagine I referenced a 'local' file in my original
post)?



Thanks!

Matt

On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-01, Matthew Tice wrote:
>
> [ problem table is pg_database ]
>
> > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > standby databases have been patched to 10.17.
>
> Hmm, I think there was a bug in the early 10.x versions where advancing
> the xid age of shared tables would not work correctly for some reason ...
> Ah yes, this was fixed in 10.5, a mere three years ago:
>
> Author: Andres Freund 
> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
>
> Fix bugs in vacuum of shared rels, by keeping their relcache entries 
> current.
>
> When vacuum processes a relation it uses the corresponding relcache
> entry's relfrozenxid / relminmxid as a cutoff for when to remove
> tuples etc. Unfortunately for nailed relations (i.e. critical system
> catalogs) bugs could frequently lead to the corresponding relcache
> entry being stale.
>
> This set of bugs could cause actual data corruption as vacuum would
> potentially not remove the correct row versions, potentially reviving
> them at a later point.  After 699bf7d05c some corruptions in this vein
> were prevented, but the additional error checks could also trigger
> spuriously. Examples of such errors are:
>   ERROR: found xmin ... from before relfrozenxid ...
> and
>   ERROR: found multixact ... from before relminmxid ...
> To be caused by this bug the errors have to occur on system catalog
> tables.
>
> The two bugs are:
>
> 1) Invalidations for nailed relations were ignored, based on the
>theory that the relcache entry for such tables doesn't
>change. Which is largely true, except for fields like relfrozenxid
>etc.  This means that changes to relations vacuumed in other
>sessions weren't picked up by already existing sessions.  Luckily
>autovacuum doesn't have particularly longrunning sessions.
>
> 2) For shared *and* nailed relations, the shared relcache init file
>was never invalidated while running.  That means that for such
>tables (e.g. pg_authid, pg_database) it's not just already existing
>sessions that are affected, but even new connections are as well.
>That explains why the reports usually were about pg_authid et. al.
>
> To fix 1), revalidate the rd_rel portion of a relcache entry when
> invalid. This implies a bit of extra complexity to deal with
> bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> simply always remove both the shared and local init files.
>
> Author: Andres Freund
> Reviewed-By: Alvaro Herrera
> Discussion:
> https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
> 
> https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
> 
> https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
> 
> https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
> Backpatch: 9.3-
>
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W
> "El número de instalaciones de UNIX se ha elevado a 10,
> y se espera que este número aumente" (UPM, 1972)




Re: datfrozenxid not dropping after vacuum

2021-09-02 Thread Matthew Tice
Interestingly enough, I hopped on the database system this morning and
found the `datfrozenxid` dropped back down below
`autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours
after the fact).

Looking through the Postgresql logs I don't see anything standing out
at that time.

I still plan on patching to 10.17 tonight.

Matt

On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice  wrote:
>
> Hi Alvaro, thanks for the quick reply.
>
> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
>
> Other than patching, is there a work around?  For example, in #2 above:
> >The fix for 2) is simpler,
> >simply always remove both the shared and local init files.
>
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?
>
>
>
> Thanks!
>
> Matt
>
> On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera  wrote:
> >
> > On 2021-Sep-01, Matthew Tice wrote:
> >
> > [ problem table is pg_database ]
> >
> > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > > standby databases have been patched to 10.17.
> >
> > Hmm, I think there was a bug in the early 10.x versions where advancing
> > the xid age of shared tables would not work correctly for some reason ...
> > Ah yes, this was fixed in 10.5, a mere three years ago:
> >
> > Author: Andres Freund 
> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 
> > -0700
> > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 
> > -0700
> >
> > Fix bugs in vacuum of shared rels, by keeping their relcache entries 
> > current.
> >
> > When vacuum processes a relation it uses the corresponding relcache
> > entry's relfrozenxid / relminmxid as a cutoff for when to remove
> > tuples etc. Unfortunately for nailed relations (i.e. critical system
> > catalogs) bugs could frequently lead to the corresponding relcache
> > entry being stale.
> >
> > This set of bugs could cause actual data corruption as vacuum would
> > potentially not remove the correct row versions, potentially reviving
> > them at a later point.  After 699bf7d05c some corruptions in this vein
> > were prevented, but the additional error checks could also trigger
> > spuriously. Examples of such errors are:
> >   ERROR: found xmin ... from before relfrozenxid ...
> > and
> >   ERROR: found multixact ... from before relminmxid ...
> > To be caused by this bug the errors have to occur on system catalog
> > tables.
> >
> > The two bugs are:
> >
> > 1) Invalidations for nailed relations were ignored, based on the
> >theory that the relcache entry for such tables doesn't
> >change. Which is largely true, except for fields like relfrozenxid
> >etc.  This means that changes to relations vacuumed in other
> >sessions weren't picked up by already existing sessions.  Luckily
> >autovacuum doesn't have particularly longrunning sessions.
> >
> > 2) For shared *and* nailed relations, the shared relcache init file
> >was never invalidated while running.  That means that for such
> >tables (e.g. pg_authid, pg_database) it's not just already existing
> >sessions that are affected, but even new connections are as well.
> >That explains why the reports usually were about pg_authid et. al.
> >
> > To fix 1), revalidate the rd_rel portion of a relcache entry when
> > invalid. This implies a bit of extra complexity to deal with
> > bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> > simply always remove both the shared and local init files.
> >
> > Author: Andres Freund
> > Reviewed-By: Alvaro Herrera
> > Discussion:
> > 
> > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
> > 
> > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
> > 
> > https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
> > 
> > https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
> > Backpatch: 9.3-
> >
> >
> > --
> > Álvaro Herrera   39°49'30"S 73°17'W
> > "El número de instalaciones de UNIX se ha elevado a 10,
> > y se espera que este número aumente" (UPM, 1972)




Re: What are best practices wrt passwords?

2024-10-16 Thread Matthew Tice



> On Oct 16, 2024, at 10:50 AM, Christophe Pettus  wrote:
> 
> 
> 
>> On Oct 16, 2024, at 09:47, Tom Lane  wrote:
>> I believe it depends on your platform --- some BSDen are pretty
>> permissive about this, if memory serves.  On a Linux box it seems
>> to work for processes owned by yourself even if you're not superuser.
> 
> I just tried it on an (admittedly kind of old) Ubuntu system and MacOS 14, 
> and it looks like shows everything owned by everyone, even from a non-sudoer 
> user.
> 
Interesting, that’s not my experience.  Only root can see the env variables of 
another user.

Terminal 1

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ whoami
testusr

$ export FOOBAR=true

$ bash

$ env | grep FOOBAR
FOOBAR=true

Terminal 2
$  whoami
mtice

$  ps e -U testusr | grep -c FOOBAR
0

$  sudo ps e -U testusr | grep -c FOOBAR
1





Sudden increase in n_dead_tup with no corresponding insert/update/delete

2025-06-03 Thread Matthew Tice
Hi all,

While investigating some potential vacuum improvements to make to a table I 
happened to notice that one table (along with others) will suddenly increase 
the number of n_dead_tup reported in pg_stat_user_tables without a 
corresponding increase in the inserts, updates, or deletes.

For instance, running this query in a 1 second loop

select * from pg_stat_user_tables where relname = 
'casino_account_history_lines';

I can see the n_dead_tup column increases until which time the autovacuum 
process finishes vacuuming the table.  Example:

-[ RECORD 1 ]---+--
relid   | 33378
schemaname  | public
relname | casino_account_history_lines
seq_scan| 1122
seq_tup_read| 178229588443
idx_scan| 456779105
idx_tup_fetch   | 5539267637
n_tup_ins   | 45093031
n_tup_upd   | 47289203
n_tup_del   | 0
n_tup_hot_upd   | 0
n_live_tup  | 1646966715
n_dead_tup  | 1356331
n_mod_since_analyze | 11498
n_ins_since_vacuum  | 6288
last_vacuum | 2025-06-03 14:57:43.46009+00
last_autovacuum | 2025-06-03 19:09:21.595322+00
last_analyze| 2025-06-03 14:57:54.848185+00
last_autoanalyze| 2025-06-03 19:09:48.390396+00
vacuum_count| 2
autovacuum_count| 3973
analyze_count   | 6
autoanalyze_count   | 3078

--
-- At this point the table is no longer in pg_stat_progress_vacuum and 
`n_dead_tup` has dropped from 1356331 to 4302
--

-[ RECORD 1 ]---+--
relid   | 33378
schemaname  | public
relname | casino_account_history_lines
seq_scan| 1122
seq_tup_read| 178229588443
idx_scan| 456779364
idx_tup_fetch   | 5539267804
n_tup_ins   | 45093063
n_tup_upd   | 47289232
n_tup_del   | 0
n_tup_hot_upd   | 0
n_live_tup  | 1646961282
n_dead_tup  | 4302
n_mod_since_analyze | 11559
n_ins_since_vacuum  | 2
last_vacuum | 2025-06-03 14:57:43.46009+00
last_autovacuum | 2025-06-03 19:12:48.107816+00
last_analyze| 2025-06-03 14:57:54.848185+00
last_autoanalyze| 2025-06-03 19:09:48.390396+00
vacuum_count| 2
autovacuum_count| 3974
analyze_count   | 6
autoanalyze_count   | 3078

--

This seems normal to me, however, while still looking at pg_stat_user_tables in 
a loop, `n_dead_tup` steadily increases to, in this latest run, `5038` at which 
point, one second later the number jumps to above 1.2 million:

Tue 03 Jun 2025 07:13:11 PM UTC (every 1s)

-[ RECORD 1 ]---+--
relid   | 33378
schemaname  | public
relname | casino_account_history_lines
seq_scan| 1122
seq_tup_read| 178229588443
idx_scan| 456784246
idx_tup_fetch   | 5539271612
n_tup_ins   | 45093719
n_tup_upd   | 47289968
n_tup_del   | 0
n_tup_hot_upd   | 0
n_live_tup  | 1646961938
n_dead_tup  | 5038
n_mod_since_analyze | 12951
n_ins_since_vacuum  | 658
last_vacuum | 2025-06-03 14:57:43.46009+00
last_autovacuum | 2025-06-03 19:12:48.107816+00
last_analyze| 2025-06-03 14:57:54.848185+00
last_autoanalyze| 2025-06-03 19:09:48.390396+00
vacuum_count| 2
autovacuum_count| 3974
analyze_count   | 6
autoanalyze_count   | 3078

Tue 03 Jun 2025 07:13:12 PM UTC (every 1s)

-[ RECORD 1 ]---+--
relid   | 33378
schemaname  | public
relname | casino_account_history_lines
seq_scan| 1122
seq_tup_read| 178229588443
idx_scan| 456784464
idx_tup_fetch   | 5539271752
n_tup_ins   | 45093746
n_tup_upd   | 47289993
n_tup_del   | 0
n_tup_hot_upd   | 0
n_live_tup  | 1647255972
n_dead_tup  | 1290579
n_mod_since_analyze | 2
n_ins_since_vacuum  | 685
last_vacuum | 2025-06-03 14:57:43.46009+00
last_autovacuum | 2025-06-03 19:12:48.107816+00
last_analyze| 2025-06-03 14:57:54.848185+00
last_autoanalyze| 2025-06-03 19:13:12.125828+00
vacuum_count| 2
autovacuum_count| 3974
analyze_count   | 6
autoanalyze_count   | 3079

I don't understand where this large increase is coming from when there are no 
corresponding inserts, updates, or deletes (at the magnitude).  This entire 
process repeats itself and, as mentioned, the same thing is happening on other 
observed tables.

I'm running version 'PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1)'

Thanks,
Matt



Re: Sudden increase in n_dead_tup with no corresponding insert/update/delete

2025-06-04 Thread Matthew Tice


> On Jun 3, 2025, at 6:23 PM, David Rowley  wrote:
> 
> On Wed, 4 Jun 2025 at 07:22, Matthew Tice  wrote:
>> Tue 03 Jun 2025 07:13:11 PM UTC (every 1s)
>> n_dead_tup  | 5038
>> autoanalyze_count   | 3078
> 
>> Tue 03 Jun 2025 07:13:12 PM UTC (every 1s)
>> n_dead_tup  | 1290579
>> autoanalyze_count   | 3079
> 
>> I don't understand where this large increase is coming from when there are 
>> no corresponding inserts, updates, or deletes (at the magnitude).  This 
>> entire process repeats itself and, as mentioned, the same thing is happening 
>> on other observed tables.
> 
> I imagine it's from the auto-analyze that ran. Analyze will try to
> estimate the live and dead rows, but since analyze only samples some
> blocks, it may come up with something that's not too accurate if the
> blocks it happened to sample don't contain similar percentages of dead
> rows than the entire table.
> 
> See [1].
> 
> David
> 
> [1] 
> https://github.com/postgres/postgres/blob/REL_15_STABLE/src/backend/commands/analyze.c#L1318

Thanks, David.  

This table is relatively large (1.6B records, 1.5TB, 38 columns).  The 
`default_statistics_target` is set to 300 - so I think that 9 may not be 
enough to gather accurate statistics.