dead tuple difference between pgstattuple and pg_stat_user_tables
Hi All, I'm trying to understand why there's a difference between what pgstattuple reports and pg_stat_user_tables reports (for the number of dead tuples). As I understand, pgstattuple and pgstattuple_approx return the exact number of dead tuples (as noted in the documentation) and based on an older Stack Overflow answer the value returned from pg_stat_user_tables "uses the most recent data collected by ANALYZE". Why would it be that even after analyzing a table the n_dead_tup value is still vastly different than dead_tuple_count? > SELECT * FROM (SELECT dead_tuple_count from pgstattuple_approx('oban.oban_jobs'))a, (SELECT n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname = 'oban')b; -[ RECORD 1 ]- dead_tuple_count | 3736 n_dead_tup | 1127044 last_autovacuum | 2024-08-23 16:00:30.983141+00 last_analyze | 2024-08-23 15:33:50.628422+00 now | 2024-08-23 16:01:19.915893+00 autovacuum_count | 446478 SELECT 1 > vacuum (verbose,analyze) oban.oban_jobs; vacuuming "oban.oban_jobs" table "oban_jobs": index scan bypassed: 29341 pages from table (0.79% of total) have 747 dead item identifiers launched 2 parallel vacuum workers for index cleanup (planned: 2) index "oban_jobs_args_index" now contains 18281 row versions in 10232 pages 0 index row versions were removed. 0 index pages were newly deleted. 56 index pages are currently deleted, of which 833 are currently reusable. CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. index "oban_jobs_meta_index" now contains 18281 row versions in 9698 pages 0 index row versions were removed. 0 index pages were newly deleted. 35 index pages are currently deleted, of which 621 are currently reusable. CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. table "oban_jobs": found 855 removable, 9661 nonremovable row versions in 29341 out of 3727204 pages 1330 dead row versions cannot be removed yet, oldest xmin: 1378705314 Skipped 0 pages due to buffer pins, 3696951 frozen pages. 912 skipped pages using mintxid fork. CPU: user: 0.12 s, system: 0.08 s, elapsed: 0.22 s. vacuuming "pg_toast.pg_toast_72454950" table "pg_toast_72454950": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages 0 dead row versions cannot be removed yet, oldest xmin: 1378705314 Skipped 0 pages due to buffer pins, 0 frozen pages. 0 skipped pages using mintxid fork. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. analyzing "oban.oban_jobs" "oban_jobs": scanned 3 of 3727204 pages, containing 75 live rows and 10501 dead rows; 75 rows in sample, 9318 estimated total rows VACUUM > SELECT * FROM (SELECT dead_tuple_count from pgstattuple_approx('oban.oban_jobs'))a, (SELECT n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname = 'oban')b; -[ RECORD 1 ]- dead_tuple_count | 1701 n_dead_tup | 1306009 last_autovacuum | 2024-08-23 16:01:31.034229+00 last_analyze | 2024-08-23 16:01:47.85574+00 now | 2024-08-23 16:01:55.734589+00 autovacuum_count | 446479 This is a Google Alloy DB instance running: > select version(); -[ RECORD 1 ]- version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian clang version 12.0.1, 64-bit SELECT 1
Re: dead tuple difference between pgstattuple and pg_stat_user_tables
On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver wrote: > On 8/23/24 09:14, Matthew Tice wrote: > > Hi All, > > > > I'm trying to understand why there's a difference between what > > pgstattuple reports and pg_stat_user_tables reports (for the number of > > dead tuples). > > > > As I understand, pgstattuple and pgstattuple_approx return the exact > > number of dead tuples (as noted in the documentation) and based on an > > https://www.postgresql.org/docs/current/pgstattuple.html > > pgstattuple_approx(regclass) returns record > > pgstattuple_approx is a faster alternative to pgstattuple that > returns approximate results. > > Not sure how you get exact count out of that? > Maybe the wording is a little confusing to me. Under the section for pgstattuple_approx: "pgstattuple_approx tries to avoid the full-table scan and returns exact dead tuple statistics along with an approximation of the number and size of live tuples and free space." > > > This is a Google Alloy DB instance running: > > https://cloud.google.com/alloydb/docs/overview > > "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible > database service that's designed for your most demanding workloads, > including hybrid transactional and analytical processing. AlloyDB pairs > a Google-built database engine with a cloud-based, multi-node > architecture to deliver enterprise-grade performance, reliability, and > availability." > > Where the important parts are 'PostgreSQL-compatible' and 'Google-built > database engine'. You probably need to reach out to Google to see what > that means for this situation. > > Got it, thanks Adrian. > > > > select version(); > > -[ RECORD 1 ]- > > version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian > > clang version 12.0.1, 64-bit > > SELECT 1 > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >
datfrozenxid not dropping after vacuum
Hi, Starting this morning at 0830 local time I noticed that my datfrozenxid starts moving past the `autovacuum_freeze_max_age` value of 2. When we encountered this in the past the solution has been to do one of the following: 1. This is related an error similar to ``` found xmin 2675436435 from before relfrozenxid 321165377 ``` Where the solution has been to move the `pg_internal.init` file out of the way and let Postgresql recreate it. Or; 2. A long-running transaction. Typically I'll just find the `idle in transaction` transactions that have a `query_start` around when my alarm went off notifying me when `datfrozenxid` breaches `autovacuum_freeze_max_age`. Using a query similar to ``` SELECT pid, query_start, datname, usename, state, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE state = 'idle in transaction'; ``` 3. The autovacuum process seemed to be "stuck" on a particular table. We would kill the pid of the autovacuum process. The problem is that neither of these solutions have seemed to drop `datfrozenxid` back down and there is one specific database in this cluster that's holding onto it. Using these queries from CrunchyData: # Show oldest current xid # WITH max_age AS ( SELECT 20 as max_old_xid , setting AS autovacuum_freeze_max_age FROM pg_catalog.pg_settings WHERE name = 'autovacuum_freeze_max_age' ) , per_database_stats AS ( SELECT datname , m.max_old_xid::int , m.autovacuum_freeze_max_age::int , age(d.datfrozenxid) AS oldest_current_xid FROM pg_catalog.pg_database d JOIN max_age m ON (true) WHERE d.datallowconn ) SELECT max(oldest_current_xid) AS oldest_current_xid , max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS percent_towards_wraparound , max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float))) AS percent_towards_emergency_autovac FROM per_database_stats; oldest_current_xid | percent_towards_wraparound | percent_towards_emergency_autovac ++--- 230935699 | 12 | 115 # Top 8 individual databases SELECT datname , age(datfrozenxid) , current_setting('autovacuum_freeze_max_age') FROM pg_database ORDER BY 2 DESC LIMIT 8; datname |age| current_setting ---+---+- siteservice | 230935699 | 2 coupon-ws | 217613246 | 2 contextchangestore | 211343280 | 2 template0 | 210351639 | 2 productmanager | 207876167 | 2 rhistory | 207876167 | 2 smsservice | 207876167 | 2 techservice | 205432524 | 2 That top database `siteservice` is the "problematic" one where a manual vacuum won't coerce it to free up the `datfrozenxid`. Looking at the tables in that database: # SELECT c.oid::regclass , age(c.relfrozenxid) , pg_size_pretty(pg_total_relation_size(c.oid)) FROM pg_class c JOIN pg_namespace n on c.relnamespace = n.oid WHERE relkind IN ('r', 't', 'm') AND n.nspname NOT IN ('pg_toast') ORDER BY 2 DESC LIMIT 10; oid |age| pg_size_pretty +---+ pg_database| 230935699 | 4264 kB pg_proc| 93543215 | 976 kB pg_collation | 93543215 | 560 kB pg_attribute | 93543215 | 600 kB pg_shdepend| 59515320 | 15 MB pg_statistic | 53828900 | 464 kB pg_subscription| 53172718 | 16 kB pg_pltemplate | 53172718 | 56 kB pg_authid | 53172718 | 8616 kB pg_db_role_setting | 53172718 | 64 kB I thought maybe it had to do with my replication slots somehow: # select slot_name, slot_type, database, active, catalog_xmin, restart_lsn, confirmed_flush_lsn from pg_replication_slots ; slot_name| slot_type | database| active | catalog_xmin | restart_lsn | confirmed_flush_lsn +---+---++--+---+- dbs1db02 | physical | | f | | | dbs1db01 | physical | | t | | 4D25/ACE6EE08 | dbs1db03 | physical | | t | | 4D25/ACE6EE08 | dbs2db01 | physical | | t | | 4D25/ACE6EE08 | debezium_cmanager | logical | campaign-manager | t | 2152258063 | 4D25/A421A6C8 | 4D25/ABC18C88 debezium_rservice| logical | retail-content-service | t | 2152238060 | 4D25/8EC403B0 | 4D25/A6105DF8 debezium_partnerservice | logical | partnerservice | t | 2152238060 | 4D25/8EC403B0 | 4D25/A5446630
Re: datfrozenxid not dropping after vacuum
Hi Alvaro, thanks for the quick reply. I'm scheduled to do my patching maintenance at the end of this month - but at this point I don't think I'm going to make it. Other than patching, is there a work around? For example, in #2 above: >The fix for 2) is simpler, >simply always remove both the shared and local init files. I'm not familiar with the differences between 'shared' and 'local' init files (I'd imagine I referenced a 'local' file in my original post)? Thanks! Matt On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera wrote: > > On 2021-Sep-01, Matthew Tice wrote: > > [ problem table is pg_database ] > > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > > standby databases have been patched to 10.17. > > Hmm, I think there was a bug in the early 10.x versions where advancing > the xid age of shared tables would not work correctly for some reason ... > Ah yes, this was fixed in 10.5, a mere three years ago: > > Author: Andres Freund > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700 > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700 > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700 > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700 > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700 > > Fix bugs in vacuum of shared rels, by keeping their relcache entries > current. > > When vacuum processes a relation it uses the corresponding relcache > entry's relfrozenxid / relminmxid as a cutoff for when to remove > tuples etc. Unfortunately for nailed relations (i.e. critical system > catalogs) bugs could frequently lead to the corresponding relcache > entry being stale. > > This set of bugs could cause actual data corruption as vacuum would > potentially not remove the correct row versions, potentially reviving > them at a later point. After 699bf7d05c some corruptions in this vein > were prevented, but the additional error checks could also trigger > spuriously. Examples of such errors are: > ERROR: found xmin ... from before relfrozenxid ... > and > ERROR: found multixact ... from before relminmxid ... > To be caused by this bug the errors have to occur on system catalog > tables. > > The two bugs are: > > 1) Invalidations for nailed relations were ignored, based on the >theory that the relcache entry for such tables doesn't >change. Which is largely true, except for fields like relfrozenxid >etc. This means that changes to relations vacuumed in other >sessions weren't picked up by already existing sessions. Luckily >autovacuum doesn't have particularly longrunning sessions. > > 2) For shared *and* nailed relations, the shared relcache init file >was never invalidated while running. That means that for such >tables (e.g. pg_authid, pg_database) it's not just already existing >sessions that are affected, but even new connections are as well. >That explains why the reports usually were about pg_authid et. al. > > To fix 1), revalidate the rd_rel portion of a relcache entry when > invalid. This implies a bit of extra complexity to deal with > bootstrapping, but it's not too bad. The fix for 2) is simpler, > simply always remove both the shared and local init files. > > Author: Andres Freund > Reviewed-By: Alvaro Herrera > Discussion: > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de > > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com > > https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com > > https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com > Backpatch: 9.3- > > > -- > Álvaro Herrera 39°49'30"S 73°17'W > "El número de instalaciones de UNIX se ha elevado a 10, > y se espera que este número aumente" (UPM, 1972)
Re: datfrozenxid not dropping after vacuum
Interestingly enough, I hopped on the database system this morning and found the `datfrozenxid` dropped back down below `autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours after the fact). Looking through the Postgresql logs I don't see anything standing out at that time. I still plan on patching to 10.17 tonight. Matt On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice wrote: > > Hi Alvaro, thanks for the quick reply. > > I'm scheduled to do my patching maintenance at the end of this month - > but at this point I don't think I'm going to make it. > > Other than patching, is there a work around? For example, in #2 above: > >The fix for 2) is simpler, > >simply always remove both the shared and local init files. > > I'm not familiar with the differences between 'shared' and 'local' > init files (I'd imagine I referenced a 'local' file in my original > post)? > > > > Thanks! > > Matt > > On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera wrote: > > > > On 2021-Sep-01, Matthew Tice wrote: > > > > [ problem table is pg_database ] > > > > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my > > > standby databases have been patched to 10.17. > > > > Hmm, I think there was a bug in the early 10.x versions where advancing > > the xid age of shared tables would not work correctly for some reason ... > > Ah yes, this was fixed in 10.5, a mere three years ago: > > > > Author: Andres Freund > > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700 > > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 > > -0700 > > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 > > -0700 > > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 > > -0700 > > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 > > -0700 > > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 > > -0700 > > > > Fix bugs in vacuum of shared rels, by keeping their relcache entries > > current. > > > > When vacuum processes a relation it uses the corresponding relcache > > entry's relfrozenxid / relminmxid as a cutoff for when to remove > > tuples etc. Unfortunately for nailed relations (i.e. critical system > > catalogs) bugs could frequently lead to the corresponding relcache > > entry being stale. > > > > This set of bugs could cause actual data corruption as vacuum would > > potentially not remove the correct row versions, potentially reviving > > them at a later point. After 699bf7d05c some corruptions in this vein > > were prevented, but the additional error checks could also trigger > > spuriously. Examples of such errors are: > > ERROR: found xmin ... from before relfrozenxid ... > > and > > ERROR: found multixact ... from before relminmxid ... > > To be caused by this bug the errors have to occur on system catalog > > tables. > > > > The two bugs are: > > > > 1) Invalidations for nailed relations were ignored, based on the > >theory that the relcache entry for such tables doesn't > >change. Which is largely true, except for fields like relfrozenxid > >etc. This means that changes to relations vacuumed in other > >sessions weren't picked up by already existing sessions. Luckily > >autovacuum doesn't have particularly longrunning sessions. > > > > 2) For shared *and* nailed relations, the shared relcache init file > >was never invalidated while running. That means that for such > >tables (e.g. pg_authid, pg_database) it's not just already existing > >sessions that are affected, but even new connections are as well. > >That explains why the reports usually were about pg_authid et. al. > > > > To fix 1), revalidate the rd_rel portion of a relcache entry when > > invalid. This implies a bit of extra complexity to deal with > > bootstrapping, but it's not too bad. The fix for 2) is simpler, > > simply always remove both the shared and local init files. > > > > Author: Andres Freund > > Reviewed-By: Alvaro Herrera > > Discussion: > > > > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de > > > > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com > > > > https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com > > > > https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com > > Backpatch: 9.3- > > > > > > -- > > Álvaro Herrera 39°49'30"S 73°17'W > > "El número de instalaciones de UNIX se ha elevado a 10, > > y se espera que este número aumente" (UPM, 1972)
Re: What are best practices wrt passwords?
> On Oct 16, 2024, at 10:50 AM, Christophe Pettus wrote: > > > >> On Oct 16, 2024, at 09:47, Tom Lane wrote: >> I believe it depends on your platform --- some BSDen are pretty >> permissive about this, if memory serves. On a Linux box it seems >> to work for processes owned by yourself even if you're not superuser. > > I just tried it on an (admittedly kind of old) Ubuntu system and MacOS 14, > and it looks like shows everything owned by everyone, even from a non-sudoer > user. > Interesting, that’s not my experience. Only root can see the env variables of another user. Terminal 1 $ cat /etc/os-release NAME="Ubuntu" VERSION="20.04.6 LTS (Focal Fossa)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 20.04.6 LTS" VERSION_ID="20.04" HOME_URL="https://www.ubuntu.com/"; SUPPORT_URL="https://help.ubuntu.com/"; BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"; PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"; VERSION_CODENAME=focal UBUNTU_CODENAME=focal $ whoami testusr $ export FOOBAR=true $ bash $ env | grep FOOBAR FOOBAR=true Terminal 2 $ whoami mtice $ ps e -U testusr | grep -c FOOBAR 0 $ sudo ps e -U testusr | grep -c FOOBAR 1
Sudden increase in n_dead_tup with no corresponding insert/update/delete
Hi all, While investigating some potential vacuum improvements to make to a table I happened to notice that one table (along with others) will suddenly increase the number of n_dead_tup reported in pg_stat_user_tables without a corresponding increase in the inserts, updates, or deletes. For instance, running this query in a 1 second loop select * from pg_stat_user_tables where relname = 'casino_account_history_lines'; I can see the n_dead_tup column increases until which time the autovacuum process finishes vacuuming the table. Example: -[ RECORD 1 ]---+-- relid | 33378 schemaname | public relname | casino_account_history_lines seq_scan| 1122 seq_tup_read| 178229588443 idx_scan| 456779105 idx_tup_fetch | 5539267637 n_tup_ins | 45093031 n_tup_upd | 47289203 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 1646966715 n_dead_tup | 1356331 n_mod_since_analyze | 11498 n_ins_since_vacuum | 6288 last_vacuum | 2025-06-03 14:57:43.46009+00 last_autovacuum | 2025-06-03 19:09:21.595322+00 last_analyze| 2025-06-03 14:57:54.848185+00 last_autoanalyze| 2025-06-03 19:09:48.390396+00 vacuum_count| 2 autovacuum_count| 3973 analyze_count | 6 autoanalyze_count | 3078 -- -- At this point the table is no longer in pg_stat_progress_vacuum and `n_dead_tup` has dropped from 1356331 to 4302 -- -[ RECORD 1 ]---+-- relid | 33378 schemaname | public relname | casino_account_history_lines seq_scan| 1122 seq_tup_read| 178229588443 idx_scan| 456779364 idx_tup_fetch | 5539267804 n_tup_ins | 45093063 n_tup_upd | 47289232 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 1646961282 n_dead_tup | 4302 n_mod_since_analyze | 11559 n_ins_since_vacuum | 2 last_vacuum | 2025-06-03 14:57:43.46009+00 last_autovacuum | 2025-06-03 19:12:48.107816+00 last_analyze| 2025-06-03 14:57:54.848185+00 last_autoanalyze| 2025-06-03 19:09:48.390396+00 vacuum_count| 2 autovacuum_count| 3974 analyze_count | 6 autoanalyze_count | 3078 -- This seems normal to me, however, while still looking at pg_stat_user_tables in a loop, `n_dead_tup` steadily increases to, in this latest run, `5038` at which point, one second later the number jumps to above 1.2 million: Tue 03 Jun 2025 07:13:11 PM UTC (every 1s) -[ RECORD 1 ]---+-- relid | 33378 schemaname | public relname | casino_account_history_lines seq_scan| 1122 seq_tup_read| 178229588443 idx_scan| 456784246 idx_tup_fetch | 5539271612 n_tup_ins | 45093719 n_tup_upd | 47289968 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 1646961938 n_dead_tup | 5038 n_mod_since_analyze | 12951 n_ins_since_vacuum | 658 last_vacuum | 2025-06-03 14:57:43.46009+00 last_autovacuum | 2025-06-03 19:12:48.107816+00 last_analyze| 2025-06-03 14:57:54.848185+00 last_autoanalyze| 2025-06-03 19:09:48.390396+00 vacuum_count| 2 autovacuum_count| 3974 analyze_count | 6 autoanalyze_count | 3078 Tue 03 Jun 2025 07:13:12 PM UTC (every 1s) -[ RECORD 1 ]---+-- relid | 33378 schemaname | public relname | casino_account_history_lines seq_scan| 1122 seq_tup_read| 178229588443 idx_scan| 456784464 idx_tup_fetch | 5539271752 n_tup_ins | 45093746 n_tup_upd | 47289993 n_tup_del | 0 n_tup_hot_upd | 0 n_live_tup | 1647255972 n_dead_tup | 1290579 n_mod_since_analyze | 2 n_ins_since_vacuum | 685 last_vacuum | 2025-06-03 14:57:43.46009+00 last_autovacuum | 2025-06-03 19:12:48.107816+00 last_analyze| 2025-06-03 14:57:54.848185+00 last_autoanalyze| 2025-06-03 19:13:12.125828+00 vacuum_count| 2 autovacuum_count| 3974 analyze_count | 6 autoanalyze_count | 3079 I don't understand where this large increase is coming from when there are no corresponding inserts, updates, or deletes (at the magnitude). This entire process repeats itself and, as mentioned, the same thing is happening on other observed tables. I'm running version 'PostgreSQL 15.6 (Ubuntu 15.6-1.pgdg22.04+1)' Thanks, Matt
Re: Sudden increase in n_dead_tup with no corresponding insert/update/delete
> On Jun 3, 2025, at 6:23 PM, David Rowley wrote: > > On Wed, 4 Jun 2025 at 07:22, Matthew Tice wrote: >> Tue 03 Jun 2025 07:13:11 PM UTC (every 1s) >> n_dead_tup | 5038 >> autoanalyze_count | 3078 > >> Tue 03 Jun 2025 07:13:12 PM UTC (every 1s) >> n_dead_tup | 1290579 >> autoanalyze_count | 3079 > >> I don't understand where this large increase is coming from when there are >> no corresponding inserts, updates, or deletes (at the magnitude). This >> entire process repeats itself and, as mentioned, the same thing is happening >> on other observed tables. > > I imagine it's from the auto-analyze that ran. Analyze will try to > estimate the live and dead rows, but since analyze only samples some > blocks, it may come up with something that's not too accurate if the > blocks it happened to sample don't contain similar percentages of dead > rows than the entire table. > > See [1]. > > David > > [1] > https://github.com/postgres/postgres/blob/REL_15_STABLE/src/backend/commands/analyze.c#L1318 Thanks, David. This table is relatively large (1.6B records, 1.5TB, 38 columns). The `default_statistics_target` is set to 300 - so I think that 9 may not be enough to gather accurate statistics.