autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
We have a cloud service that uses PostgreSQL to temporarily store binary content. We're using PostgreSQL's Large Objects to store the binary content. Each large object lives anywhere from a few hundred milliseconds to 5-10 minutes, after which it is deleted. Normally, this works just fine and we have no issues. However, roughly every 3 months or so, disk usage on our PostgreSQL database starts to increase at an alarming rate. More specifically, the pg_largeobject table's disk usage steadily increases even though we've deleted most of the large objects. This will continue until the database server runs out of disk space (and then all sorts of bad things happen of course). It looks to us like autovacuum against the pg_largeobject table stops working because the pg_stat_all_tables.last_autovacuum column stops updating, or is updated infrequently, while it is still updated on other tables: schemaname | relname | n_live_tup | n_dead_tup | last_autovacuum +-+++--- pg_catalog | pg_largeobject | 37205 | 92614852 | 2020-06-15 01:55:09.037101+00 pg_toast | pg_toast_973434724 |281 | 3701 | 2020-06-15 17:22:54.746452+00 pg_catalog | pg_largeobject_metadata |320 | 1236 | 2020-06-15 17:23:49.765878+00 pg_catalog | pg_shdepend |344 |933 | 2020-06-15 17:23:54.448855+00 pg_toast | pg_toast_2619 | 52 | 85 | 2020-06-15 17:22:01.387933+00 pg_catalog | pg_statistic|413 |100 | 2020-06-15 17:15:52.656791+00 pg_catalog | pg_class|349 | 2 | 2020-06-12 17:58:13.147931+00 pg_catalog | pg_attribute| 2633 | 1 | 2020-06-12 17:58:13.000114+00 Looking at our PostgreSQL logs, it looks like the autovacuum task takes increasingly more time to run (against the pg_largeobject table): 2020-06-12T19:41:58.335931494Z stderr F system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 1.77 s 2020-06-12T19:42:59.704884752Z stderr F system usage: CPU: user: 0.02 s, system: 0.02 s, elapsed: 3.06 s 2020-06-12T19:44:01.928300512Z stderr F system usage: CPU: user: 0.06 s, system: 0.01 s, elapsed: 5.44 s 2020-06-12T19:45:14.124209167Z stderr F system usage: CPU: user: 0.11 s, system: 0.05 s, elapsed: 17.13 s 2020-06-12T19:46:16.28758936Z stderr F system usage: CPU: user: 0.13 s, system: 0.08 s, elapsed: 19.04 s 2020-06-12T19:47:34.264882626Z stderr F system usage: CPU: user: 0.20 s, system: 0.19 s, elapsed: 36.22 s 2020-06-12T19:49:15.383436343Z stderr F system usage: CPU: user: 0.28 s, system: 0.38 s, elapsed: 74.06 s 2020-06-12T19:53:47.229361981Z stderr F system usage: CPU: user: 0.66 s, system: 1.06 s, elapsed: 214.12 s 2020-06-12T20:19:39.619748109Z stderr F system usage: CPU: user: 4.13 s, system: 5.30 s, elapsed: 1461.16 s 2020-06-12T21:30:31.634637945Z stderr F system usage: CPU: user: 13.73 s, system: 19.59 s, elapsed: 4225.61 s 2020-06-12T23:54:32.511015886Z stderr F system usage: CPU: user: 33.57 s, system: 41.90 s, elapsed: 8514.23 s 2020-06-13T04:23:32.230960572Z stderr F system usage: CPU: user: 66.09 s, system: 82.95 s, elapsed: 16011.25 s 2020-06-13T12:00:37.43434175Z stderr F system usage: CPU: user: 99.42 s, system: 130.48 s, elapsed: 27296.71 s 2020-06-14T02:40:25.19122979Z stderr F system usage: CPU: user: 202.96 s, system: 263.66 s, elapsed: 52653.66 s 2020-06-15T01:55:09.03766272Z stderr F system usage: CPU: user: 317.54 s, system: 544.48 s, elapsed: 83550.21 s We have several instances of our cloud service, and each instance has it's own database. On other instances, the autovacuum elapsed is consistently less than 25 seconds on every run. Other than the increasing elapsed times for the autovacuum, we don't see any other indication in the logs of a problem (no error messages, etc). We're currently using PostgreSQL version 10.10. Our service is JVM-based and we're using the PostgreSQL JDBC driver version 42.2.5. Have we stumbled upon a potential bug here, or do we need to tweak some autovacuum settings? What should we look at next or what should we try next to further troubleshoot this? Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Thanks Michael, Here are our current autovacuum settings: autovacuum | on autovacuum_analyze_scale_factor | 0.1 autovacuum_analyze_threshold| 50 autovacuum_freeze_max_age | 2 autovacuum_max_workers | 3 autovacuum_multixact_freeze_max_age | 4 autovacuum_naptime | 1min autovacuum_vacuum_cost_delay| 20ms autovacuum_vacuum_cost_limit| -1 autovacuum_vacuum_scale_factor | 0.2 autovacuum_vacuum_threshold | 50 autovacuum_work_mem | -1 log_autovacuum_min_duration | 1s > Is this a destructive queue of sorts with no rows permanently stored? Essentially, yes. The system can create and delete objects at a high rate. Here are some recent stats on the pg_largeobject table: # select * from pg_stat_all_tables where relname = 'pg_largeobject'; -[ RECORD 1 ]---+--- relid | 2613 schemaname | pg_catalog relname | pg_largeobject seq_scan| 0 seq_tup_read| 0 idx_scan| 66619475 idx_tup_fetch | 126816721 n_tup_ins | 57580140 n_tup_upd | 188474 n_tup_del | 57640395 n_tup_hot_upd | 108845 n_live_tup | 47771 n_dead_tup | 57787135 n_mod_since_analyze | 115409009 last_vacuum | last_autovacuum | last_analyze| last_autoanalyze| vacuum_count| 0 autovacuum_count| 0 analyze_count | 0 autoanalyze_count | 0 # SELECT pg_size_pretty( pg_total_relation_size('pg_largeobject') ); -[ RECORD 1 ]--+--- pg_size_pretty | 350 GB The total size of the actual undeleted large objects is about 60 MB. Regards, Jim Hurne From: Michael Lewis To: Jim Hurne Cc: PostgreSQL General Date: 06/16/2020 01:06 PM Subject:[EXTERNAL] Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked On Tue, Jun 16, 2020 at 10:01 AM Jim Hurne wrote: Other than the increasing elapsed times for the autovacuum, we don't see any other indication in the logs of a problem (no error messages, etc). We're currently using PostgreSQL version 10.10. Our service is JVM-based and we're using the PostgreSQL JDBC driver version 42.2.5. Have we stumbled upon a potential bug here, or do we need to tweak some autovacuum settings? What should we look at next or what should we try next to further troubleshoot this? What are your current autovacuum settings? Do you have long running transactions with any frequency? Decreasing autovacuum_vacuum_cost_delay to 1 or 2ms may be prudent (default changes from 20ms down to 2ms with PG 12). Is this a destructive queue of sorts with no rows permanently stored? If so, I would expect that a daily scheduled vacuum analyze may be the best course of action.
Re: Sv: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
> In my experience vacuumlo, https://www.postgresql.org/docs/12/vacuumlo.html, is needed to remove large objects, before vacuum can remove them from pg_largeobject. Thanks for the suggestion! vacuumlo deletes any orphaned large objects, presumably by using lo_unlink. While we don't use vacuumlo, our service does clean up orphaned large objects in a similar way. And indeed, inspecting the database reveals that there are no orphaned large objects. Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Michael Lewis wrote on 06/16/2020 04:41:16 PM: > Still, if you run a manual vacuum analyze verbose, what sort of output do you get? On one of the instances that is exhibiting the "disk leak" behavior, the VACUUM ANALYZE VERBOSE command doesn't generate any output or complete before I loose the connection to the database (presumably because I hit a connection read timeout). Is it possible to configure th read timeout for psql? One some of our healthy instances, we were able to run VACUUM ANALYZE VERBOSE: => vacuum verbose analyze pg_largeobject; INFO: vacuuming "pg_catalog.pg_largeobject" INFO: scanned index "pg_largeobject_loid_pn_index" to remove 630 row versions DETAIL: CPU: user: 0.06 s, system: 0.10 s, elapsed: 0.17 s INFO: "pg_largeobject": removed 630 row versions in 190 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: index "pg_largeobject_loid_pn_index" now contains 533 row versions in 18346 pages DETAIL: 630 index row versions were removed. 18340 index pages have been deleted, 18339 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "pg_largeobject": found 577 removable, 533 nonremovable row versions in 399 out of 399 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 69015245 There were 550 unused item pointers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. CPU: user: 0.06 s, system: 0.11 s, elapsed: 0.17 s. INFO: "pg_largeobject": truncated 399 to 305 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: analyzing "pg_catalog.pg_largeobject" INFO: "pg_largeobject": scanned 305 of 305 pages, containing 533 live rows and 0 dead rows; 533 rows in sample, 533 estimated total rows VACUUM > Are there indexes on this table that perhaps are very large and > needing to be rescanned many times because maintenance_work_mem > isn't high enough to handle in a single pass? On the same healthy instance where we were able to run the VACUUM ANALYZE VERBOSE, the index size does seem reasonably small: => SELECT pg_size_pretty (pg_indexes_size('pg_largeobject')); -[ RECORD 1 ]--+--- pg_size_pretty | 143 MB But on the unhealthy instance, it is much larger: => SELECT pg_size_pretty (pg_indexes_size('pg_largeobject')); pg_size_pretty 7241 MB (1 row) But it isn't clear to me if the index size is a symptom or if it's the actual root cause. > You might try "create index concurrently, drop index concurrently, > & rename index" (reindex concurrently if you were on > PG 12) as a sort of online 'vacuum full' on the index(es). Unfortunately, since pg_largeobject is a system table, the user we use to connect to the database doesn't have permissions to do this. We get a " must be owner of relation pg_largeobject" error when we try to create the replacement index (using CREATE INDEX CONCURRENTLY). > By the way, the best practices for these mailing list suggest > partial quoting and responding in-line or below, not "top posting" > with the entire conversation below. My mistake! Sorry about that. Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
We're still struggling to figure out why autovacuum seems to be failing or is ineffective sometimes against the pg_largeobject table. We decided to try a VACUUM FULL VERBOSE on one of our problematic databases. It did complete, but it took a surprisingly long time: INFO: vacuuming "pg_catalog.pg_largeobject" INFO: "pg_largeobject": found 97326130 removable, 22230 nonremovable row versions in 207508868 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 1085.87 s, system: 3803.66 s, elapsed: 10883.94 s. VACUUM So it took about 3 hours. It's surprising because there was only 58 MB of undeleted data in about 65 large objects. When we ran this particular vacuum, we made sure there were no other processes connected to the DB and that no other statements were running that would have locked the table. It's my understanding that a VACUUM FULL works by creating a brand new table and copying the "non-deleted" data into the new table. If that's the case, I would have expected it to complete very quickly, given how little "non-deleted" data existed at the time. We are of course going to continue to try different things, but does anyone have any other suggestions on what we should be looking at or what settings we might want to adjust? For reference, here was my original problem description: > We have a cloud service that uses PostgreSQL to temporarily store binary > content. We're using PostgreSQL's Large Objects to store the binary > content. Each large object lives anywhere from a few hundred milliseconds > to 5-10 minutes, after which it is deleted. > > Normally, this works just fine and we have no issues. However, roughly > every 3 months or so, disk usage on our PostgreSQL database starts to > increase at an alarming rate. More specifically, the pg_largeobject > table's disk usage steadily increases even though we've deleted most of > the large objects. This will continue until the database server runs out > of disk space (and then all sorts of bad things happen of course). > > It looks to us like autovacuum against the pg_largeobject table stops > working because the pg_stat_all_tables.last_autovacuum column stops > updating, or is updated infrequently, while it is still updated on other > tables. > > We're currently using PostgreSQL version 10.10. Our service is JVM-based > and we're using the PostgreSQL JDBC driver version 42.2.5. Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Magnus Hagander wrote on 06/22/2020 04:44:33 PM: > Yes, that's how VACUUM FULL works (more or less). Thanks for the confirmation and further explanation Magnus! I'm definitely getting a better idea of how the vacuum and autovacuum facilities work. > And for autovacuum, with a cost_delay of 20ms and a cost_limit of > 200, autovacuum would spend about 55 hours just on vacuum delay for > the reads... Ah, very interesting. That explains (at least in part) why the autovacuums seem to be taking a long time. We'll consider adjusting the cost_delay and cost_limit settings. Is there any downsides to setting a very low cost_delay (maybe even 0ms), or to setting a large cost_limit (such as 100,000)? > The logs you posted originally seem to be excluding the actual > autovacuum details -- can you include those? Sure! Below are more of the details from the same set of logs. Looking at them myself, I see that there is always some percentage of tuples that are dead but are not yet removable. And that number increases on every vacuum, which might explain in part why autovacuum elapsed times keep increasing. What causes a dead tuple to be unremovable? -- LOGS -- 2020-06-12T19:41:58.335881879Z stderr F 2020-06-12 19:41:58 UTC [528141]: [2-1] user=,db=,client= LOG: automatic vacuum of table "ibmclouddb.pg_catalog.pg_largeobject": index scans: 0 2020-06-12T19:41:58.335899718Z stderr F pages: 0 removed, 8649 remain, 0 skipped due to pins, 0 skipped frozen 2020-06-12T19:41:58.335908658Z stderr F tuples: 0 removed, 30445 remain, 17410 are dead but not yet removable, oldest xmin: 1942750826 2020-06-12T19:41:58.335917078Z stderr F buffer usage: 17444 hits, 0 misses, 0 dirtied 2020-06-12T19:41:58.335925217Z stderr F avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s 2020-06-12T19:41:58.335931494Z stderr F system usage: CPU: user: 0.02 s, system: 0.00 s, elapsed: 1.77 s 2020-06-12T19:42:59.704833148Z stderr F 2020-06-12 19:42:59 UTC [528231]: [2-1] user=,db=,client= LOG: automatic vacuum of table "ibmclouddb.pg_catalog.pg_largeobject": index scans: 1 2020-06-12T19:42:59.704852755Z stderr F pages: 0 removed, 14951 remain, 0 skipped due to pins, 0 skipped frozen 2020-06-12T19:42:59.704861911Z stderr F tuples: 2 removed, 53054 remain, 27127 are dead but not yet removable, oldest xmin: 1942751009 2020-06-12T19:42:59.704870553Z stderr F buffer usage: 30144 hits, 0 misses, 3 dirtied 2020-06-12T19:42:59.704878195Z stderr F avg read rate: 0.000 MB/s, avg write rate: 0.008 MB/s 2020-06-12T19:42:59.704884752Z stderr F system usage: CPU: user: 0.02 s, system: 0.02 s, elapsed: 3.06 s 2020-06-12T19:44:01.928225496Z stderr F 2020-06-12 19:44:01 UTC [528326]: [2-1] user=,db=,client= LOG: automatic vacuum of table "ibmclouddb.pg_catalog.pg_largeobject": index scans: 0 2020-06-12T19:44:01.928265589Z stderr F pages: 0 removed, 22395 remain, 0 skipped due to pins, 0 skipped frozen 2020-06-12T19:44:01.928276366Z stderr F tuples: 0 removed, 76562 remain, 37235 are dead but not yet removable, oldest xmin: 1942751009 2020-06-12T19:44:01.9282856Z stderr F buffer usage: 45131 hits, 0 misses, 433 dirtied 2020-06-12T19:44:01.928293976Z stderr F avg read rate: 0.000 MB/s, avg write rate: 0.621 MB/s 2020-06-12T19:44:01.928300512Z stderr F system usage: CPU: user: 0.06 s, system: 0.01 s, elapsed: 5.44 s 2020-06-12T19:45:14.124140716Z stderr F 2020-06-12 19:45:14 UTC [528433]: [2-1] user=,db=,client= LOG: automatic vacuum of table "ibmclouddb.pg_catalog.pg_largeobject": index scans: 0 2020-06-12T19:45:14.124178864Z stderr F pages: 0 removed, 31029 remain, 0 skipped due to pins, 0 skipped frozen 2020-06-12T19:45:14.124187376Z stderr F tuples: 0 removed, 104441 remain, 65915 are dead but not yet removable, oldest xmin: 1942751009 2020-06-12T19:45:14.124196608Z stderr F buffer usage: 57692 hits, 4834 misses, 2095 dirtied 2020-06-12T19:45:14.124203046Z stderr F avg read rate: 2.204 MB/s, avg write rate: 0.955 MB/s 2020-06-12T19:45:14.124209167Z stderr F system usage: CPU: user: 0.11 s, system: 0.05 s, elapsed: 17.13 s 2020-06-12T19:46:16.287517553Z stderr F 2020-06-12 19:46:16 UTC [528529]: [2-1] user=,db=,client= LOG: automatic vacuum of table "ibmclouddb.pg_catalog.pg_largeobject": index scans: 0 2020-06-12T19:46:16.287558376Z stderr F pages: 0 removed, 36257 remain, 0 skipped due to pins, 0 skipped frozen 2020-06-12T19:46:16.287567454Z stderr F tuples: 0 removed, 122273 remain, 81080 are dead but not yet removable, oldest xmin: 1942751009 2020-06-12T19:46:16.287575752Z stderr F buffer usage: 62700 hits, 10366 misses, 76 dirtied 2020-06-12T19:46:16.287582866Z stderr F avg read rate: 4.252 MB/s, avg write rate: 0.031 MB/s 2020-06-12T19:46:16.28758936Z stderr F system usage: CPU: user: 0.13 s, system: 0.08 s, elapsed: 1
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
"Daniel Verite" wrote on 06/22/2020 06:00:37 PM: > If you can arrange a maintenance window, a faster way to rebuild > pg_largeobject when it contains mostly empty pages can be to: Thanks Daniel. If we cannot arrange for a maintenance window, is this the sort of thing that pg_repack[1] could be used to do? Would pg_repack require setting allow_system_table_mods to on? [1] https://github.com/reorg/pg_repack Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
"Daniel Verite" wrote on 06/24/2020 10:08:27 AM: > Unfortunately it [pg_repack] can't work with pg_largeobject That is unfortunate, and potentially leaves us in a difficult spot. Is it possible to configure PosgreSQL to use a user table for large objects instead of a system table? We're finding it to be especially difficult to maintain the pg_largeobject system table (tweak autovacuum settings, manually cleanup with something like pg_repack, etc), particularly since our PosrgreSQL instances are hosted and controlled by another team. Regards, Jim Hurne
Re: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
Michael Lewis wrote on 06/23/2020 04:44:51 PM: > Long running transactions are the common one that I see. You might > be dealing with replication slots or prepared transactions. > Basically, if some process might see that "old truth", then it can't > be vacuumed away yet. Thanks, those links you provided were informative. Our application doesn't use prepared transactions, so that one is easy to eliminate. Our database does use replication, but there's only one replication slot and the xmin and catalog_xmin columns are blank. I presume the only replication slot that exists is the one that the other replica is in fact using. I *think* this means replication isn't the issue, but it's hard to say since the xmin column is blank (and all the reading I've done doesn't even mention that the xmin on replication slots can be null). That brings us to transactions. I looked for any long-running idle transaction (transactions that have been idle for more than 15 or 5 minutes), but found none. I tried: SELECT pid, datname, usename, state, backend_xmin FROM pg_stat_activity WHERE backend_xmin IS NOT NULL ORDER BY age(backend_xmin) DESC; But the backend_xmin for all of the rows returned is exactly the same, and that xmin is greater than the oldest xmin reported in the autovacuum logs. It does seem odd that the backend_xmin value isn't changing though. Is that normal? So, for replication slots, I'm seeing a null xmin value, which the articles do not comment on how that should be interpreted. And for transactions, all of the transaction xmins are the same, which also seems odd and not what the articles suggested. I know the transactions themselves are coming and going because I can see the pids changing, but the xmins are always the same. Strange. Having to check the logs to see what the oldest xmin is painful, and requires that a vacuum or autovacuum has been executed. Is there another way to check what the oldest xmin is on the dead tuples of a table? Regards, Jim Hurne
RE: autovacuum failing on pg_largeobject and disk usage of the pg_largeobject growing unchecked
"Jim Hurne" wrote on 06/24/2020 03:44:26 PM: > That brings us to transactions. I looked for any long-running idle > transaction (transactions that have been idle for more than 15 or 5 > minutes), but found none. It turns out I didn't see any long running transactions because I was using a user who didn't have permissions to view most of the data in the pg_stat_activity view. When we looked using a more powerful user, we found four transactions that had been idle for a long time. Killing those transactions allowed vacuums to clean up the dead tuples. This is most likely the root of all of our problems, so we can essentially consider this mystery solved for now. Thanks to all who gave us advice and information. It's been highly educational. I'm just sorry it turned out to be a stupid "long running transaction" problem. We'll know to look for that first in the future.