Archiving Data to Another DB?
Let's say I have two DBs: main (9.6.6) and archive (10.2). I have a table in main where I want to archive data older then 60 days. For various reasons, the table is not partitioned, so for now we must use DELETE. The destination table in the archive DB is partitioned with the new Pg10 partitioning. My initial plan was to have a stored procedure on the archive DB use postgres_fdw to do an INSERT INTO / SELECT to select the data remotely and insert into the local archive table. It would then issue a single DELETE command to remotely remove the data from the main DB. However I found that doing this resulted in the main DB calling thousands (perhaps millions if it's one-per-row) of individual DELETE statements based on a ctid column. Aside from WAL behavior concerns, it is flooding my postgresql server logs since I log any DML. On top of that, I'm told that a remote DELETE wouldn't be transactional, so if I were to compare inserted rows vs deleted rows and found a mismatch, I couldn't just rollback the DELETE. I plan to verify this with a small test case later but for now I'll assume this to be true. Right now I'm thinking of falling back to the far-less-elegant method of dumping the data to a flat file via COPY, running psql to connect to the archive DB remotely and running a COPY to load the data (or maybe transferring the flat file to the archive DB to load it there, offloading that part of the workload), then deleting the data from the main DB. I could capture the rows dumped in a control table and compare the rows deleted against that and then rollback the delete if necessary. Like I said, not elegant, but I don't want to risk losing data that wasn't successfully archived to the archive DB. I'm very interested to hear what others might be doing for tasks like this. Don. -- Don Seiler www.seiler.us
Re: Archiving Data to Another DB?
On Wed, Apr 11, 2018 at 12:58 PM, Adrian Klaver wrote: > > "F.33.3. Transaction Management > > During a query that references any remote tables on a foreign server, > postgres_fdw opens a transaction on the remote server if one is not already > open corresponding to the current local transaction. The remote transaction > is committed or aborted when the local transaction commits or aborts. > Savepoints are similarly managed by creating corresponding remote > savepoints. > > ..." Interesting, I'll work on a test case later! > I may be missing something, but why not reverse your original set up? > Assuming transactional behavior works as expected something like: > > 1) Setup postgres_fdw in main database. > > 2) Create FOREIGN TABLE pointing to table in archive database. > > 3) INSERT INTO/SELECT from main table to archive table. > > 4) DELETE FROM main table. I had considered this as well, as this would allow me to rollback the delete (assuming my intel on postgres_fdw transactions was correct, which it may not be after all). I wondered if a remote insert would be broken up into individual inserts like the remote delete was, as that would be equally unappealing for the same reasons. But obviously worth confirming. Don. -- Don Seiler www.seiler.us
Re: Archiving Data to Another DB?
On Wed, Apr 11, 2018 at 1:13 PM, Don Seiler wrote: > > I had considered this as well, as this would allow me to rollback the > delete (assuming my intel on postgres_fdw transactions was correct, which > it may not be after all). I wondered if a remote insert would be broken up > into individual inserts like the remote delete was, as that would be > equally unappealing for the same reasons. But obviously worth confirming. > I have now confirmed that a remote INSERT does get broken up into a single INSERT for each row. -- Don Seiler www.seiler.us
Re: Archiving Data to Another DB?
On Wed, Apr 11, 2018 at 1:33 PM, Adrian Klaver wrote: > > A test case here confirms it sends individual INSERTS: > > test_(postgres)# insert into fdw_test_table select * from fdw_test; > INSERT 0 3 > > Where fdw_test_table is the remote table and fdw_test is the local one. > > postgres-2018-04-11 11:29:23.812 PDT-0LOG: statement: insert into > fdw_test_table select * from fdw_test; > postgres-2018-04-11 11:29:23.812 PDT-0LOG: execute pgsql_fdw_prep_2: > INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3) > postgres-2018-04-11 11:29:23.812 PDT-0DETAIL: parameters: $1 = '1', $2 = > 'one', $3 = 't' > postgres-2018-04-11 11:29:23.813 PDT-10140LOG: execute pgsql_fdw_prep_2: > INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3) > postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL: parameters: $1 = '2', > $2 = 'two', $3 = 'f' > postgres-2018-04-11 11:29:23.813 PDT-10140LOG: execute pgsql_fdw_prep_2: > INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3) > postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL: parameters: $1 = '3', > $2 = 'three', $3 = 'f' > > So much for that idea(:= Yeah, I saw the same with a 132 row insert. Now imagine that with a monthly 50 million row insert or delete. :p Thanks for the confirmation! I'm definitely leaning towards the copy/load/delete method. Don. -- Don Seiler www.seiler.us
Re: Archiving Data to Another DB?
On Wed, Apr 11, 2018 at 1:38 PM, Don Seiler wrote: > > Yeah, I saw the same with a 132 row insert. Now imagine that with a > monthly 50 million row insert or delete. :p Thanks for the confirmation! > I went back to look at the postgres logs on my dev server. These logs are rotated once they hit 2G in size. One typical log from the middle of my test last night hit that in 13 minutes and had over 5.2 million DELETE calls. There a quite a few logs like this. That would not be fun for the disk space on the log volume, either. Don. -- Don Seiler www.seiler.us
Re: New website
On Wed, Apr 18, 2018 at 5:03 PM, Adrian Klaver wrote: > I would contact the Webmaster but Contact goes to a big image of an > elephant head. That is also where Downloads, Support and Donate lands. > Might have been a good idea to roll out a demo site for testing first. Will > reserve judgment on the site design until it is functioning. Those links work fine for me on the new site. I haven't hit any issues yet in my casual click-testing. -- Don Seiler www.seiler.us
Re: initdb failing (10.4 centos7)
On Sat, May 19, 2018 at 2:10 PM, Michael Nolan wrote: > initdb is failing on Centos 7 with 10.4 because the install procedure does > not change the ownership of the /usr/local/pgsql directory to postgres. > > Changing the ownership fixes the problem, but the install procedure should > do this. > Are you using the PGDG repo packages, or the default CentOS repo packages? You should use PGDG and those should install under /var/lib/pgsql. Don. -- Don Seiler www.seiler.us
Re: Bad performance with cascaded deletes
On Tue, Jun 12, 2018 at 2:26 AM, Haug Bürger wrote: > > "availability_cache_ac_session_id_fkey" FOREIGN KEY (ac_session_id) > REFERENCES zpg_data.session(id) ON DELETE CASCADE > > Do you have an index on availability_cache.ac_session_id? These fields are not automatically indexed and that can lead to horrible performance on cascading operations like this. -- Don Seiler www.seiler.us
Re: Bad performance with cascaded deletes
On Tue, Jun 12, 2018 at 10:48 AM, Don Seiler wrote: > On Tue, Jun 12, 2018 at 2:26 AM, Haug Bürger > wrote: > >> >> "availability_cache_ac_session_id_fkey" FOREIGN KEY (ac_session_id) >> REFERENCES zpg_data.session(id) ON DELETE CASCADE >> >> > Do you have an index on availability_cache.ac_session_id? These fields > are not automatically indexed and that can lead to horrible performance on > cascading operations like this. > I'm blind apparently, it's your PK. -- Don Seiler www.seiler.us
Re: Issues on 9.5 to 9.6 Upgrade.
On Wed, Jun 13, 2018 at 11:16 AM, wrote: > > > Can anyoen explain WHY there is such a big difference? Is it the SQL > statement or a bug in the pgdump ? > > Did you analyze the database after upgrading? Or at least the tables in question? Those are very different plans. -- Don Seiler www.seiler.us
Vacuum process waiting on BufferPin
Postgres 9.6.6 on CentOS 7 This afternoon I discovered an autovacuum process that had been running for over 6 days. It was waiting on BufferPin event. I kicked off a manual vacuum+analyze of the table, which automatically killed that autovacuum. This ran for a few minutes before it too was waiting on a BufferPin event. I've never witnessed a vacuum session waiting on BufferPin before. In pg_locks, I see a handful of sessions that have an AccessShareLock on the table I'm trying to vacuum. My vacuum session has a ShareUpdateExclusiveLock on that relation. All of those sessions look like orphaned reporting sessions sitting "idle in transaction". It's unclear to me why a report job would end up idle in transaction, to my knowledge we don't disable autocommit from the reporting app server. Anyway, my next step is getting the OK to terminate those idle in transaction sessions to see if that gets my vacuum job moving. Meanwhile I'll ask a dev to sort out why they might be sitting idle in transaction, there's no reason for them to be unless the app server connection died and they are zombies. However I'm curious if there is someplace else I could/should also look to get to the root cause of this. Don. -- Don Seiler www.seiler.us
Re: Vacuum process waiting on BufferPin
On Mon, Aug 13, 2018 at 4:15 PM, Alvaro Herrera wrote: > > Maybe you had a cursor that was not fully scanned before the session was > left idle -- as I recall, those can leave buffers pinned. > I don't quite follow this. What circumstances would lead to this situation? For what its worth, these sessions are backend reporting jobs, not user interfacing at all. Don. -- Don Seiler www.seiler.us
Re: Vacuum process waiting on BufferPin
On Mon, Aug 13, 2018 at 4:04 PM, Don Seiler wrote: > > > Anyway, my next step is getting the OK to terminate those idle in > transaction sessions to see if that gets my vacuum job moving. Meanwhile > I'll ask a dev to sort out why they might be sitting idle in transaction, > there's no reason for them to be unless the app server connection died and > they are zombies. However I'm curious if there is someplace else I > could/should also look to get to the root cause of this. > FYI, killing those idle-in-transaction sessions did free up the vacuum job, which then finished almost immediately afterward. Don. -- Don Seiler www.seiler.us
Re: Vacuum process waiting on BufferPin
On Mon, Aug 13, 2018 at 5:55 PM, Vick Khera wrote: > On Mon, Aug 13, 2018 at 5:19 PM, Don Seiler wrote: > >> >> I don't quite follow this. What circumstances would lead to this >> situation? >> > > BEGIN WORK; > DECLARE CURSOR ... ; > FETCH ...; -- for some number of fetches, which does not reach the end of > the cursor. > > then just sit there idle, without having closed the cursor or fetching > anything more. > So the fix in that case would be to ensure that they CLOSE the cursors when done with them? Don. -- Don Seiler www.seiler.us
Re: Vacuum process waiting on BufferPin
On Tue, Aug 14, 2018 at 8:23 AM, Vick Khera wrote: > > > The general fix is to never sit idle in transaction, but this specific > case closing the cursor seems like it will also do it. > Of course. I've let development know that they need to sort out why it's left that way. But if they're also NOT closing cursors, that seems like another bad practice to correct. Would commit/rollback automatically close cursors opened in that transaction? Don. -- Don Seiler www.seiler.us
Re: Vacuum process waiting on BufferPin
On Tue, Aug 14, 2018 at 8:58 AM, Vik Fearing wrote: > > There is an idle_in_transaction_session_timeout parameter to kill > connections that are idle (in transaction) for too long. It was > implemented specifically for cases like this. Thanks for reminding me of this. I'll probably look to set it but make sure to point out that it is just a safety net to let DB maintenance run and they should make sure their work is committed cleanly if they want to keep it. Don. -- Don Seiler www.seiler.us
Odd Row Estimates in Query Plan (rows=75)
PostgreSQL 9.6.6 on CentOS. We have a report query that has gone from maybe a few seconds to run to a few minutes to run since mid-July. Looking at the output of EXPLAIN ANALYZE, the row count estimates are way off, even though this table was just analyzed a day or so ago. What's more bizarre to me is that the row count esimate is *always* 75 for every node of the plan, where the actual rows is in the hundreds or thousands. This table is one of the busiest tables in our production database (many inserts and updates). It is autovacuumed and autoanalyzed a few times per week, although I'm looking to change it to a nightly manual schedule to avoid daytime autovacuums. Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual time=179877.869..179878.011 rows=759 loops=1) Hash Cond: (stores.pkey = lt.store_pkey) Buffers: shared hit=1654593 read=331897 dirtied=249 -> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual time=0.007..0.023 rows=78 loops=1) Buffers: shared hit=2 -> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual time=179877.847..179877.847 rows=759 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 73kB Buffers: shared hit=1654591 read=331897 dirtied=249 -> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75 width=50) (actual time=179875.976..179877.697 rows=759 loops=1) Buffers: shared hit=1654591 read=331897 dirtied=249 -> GroupAggregate (cost=1869138.59..1869140.65 rows=75 width=50) (actual time=179875.976..179877.606 rows=759 loops=1) Group Key: lts.store_pkey, lts.owner, (date_trunc('minute'::text, lts.date_gifted)) Filter: (count(*) IS NOT NULL) Buffers: shared hit=1654591 read=331897 dirtied=249 -> Sort (cost=1869138.59..1869138.78 rows=75 width=42) (actual time=179875.961..179876.470 rows=6731 loops=1) Sort Key: lts.store_pkey, lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted)) Sort Method: quicksort Memory: 757kB Buffers: shared hit=1654591 read=331897 dirtied=249 -> Index Scan using gifts_date_added on gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual time=190.657..179870.165 rows=6731 loops=1) Index Cond: ((date_added > '2018-07-14 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13 14:14:21'::timestamp without time zone)) Filter: ((date_gifted >= '2018-08-13 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13 14:14:21'::timestamp without time zone)) Rows Removed by Filter: 938197 Buffers: shared hit=1654591 read=331897 dirtied=249 Planning time: 0.426 ms Execution time: 179893.894 ms I don't have a version of this query from prior to this summer, but getting explain plan for older data from older sandboxes show a similar plan. Sidenote: I am suggesting that an index be added on the date_gifted field as that is far more selective and avoids throwing rows away. However I'm very interested in why every node dealing with the gifts table thinks rows=75 when the actual is much, much higher. And 75 seems like too round of a number to be random? -- Don Seiler www.seiler.us
Re: Odd Row Estimates in Query Plan (rows=75)
Here's the query, obfuscated manually by me: SELECT 'Foo' as system_function, stores.name as store, lt.owner, lt.minute_of_day, lt.records FROM foo.stores LEFT OUTER JOIN (SELECT lts.store_pkey, lts.owner, date_trunc('minute', lts.date_gifted) as minute_of_day, count(*) as records FROM foo.gifts lts WHERE lts.date_added > '2017-07-14 11:13:05' AND lts.date_added < '2017-08-13 14:14:21' AND lts.date_gifted >= '2017-08-13 11:13:05' AND lts.date_gifted < '2017-08-13 14:14:21' GROUP BY 1,2,3 ORDER BY 1 ) lt ON lt.store_pkey = stores.pkey WHERE lt.records IS NOT NULL; The foo.gifts table is pretty much the core table of our database. It's big and very active. There is an index on date_added but not yet on date_gifted. I'm working to re-write the query while the dev sees if we even need this query anymore. On Wed, Aug 15, 2018 at 2:39 PM, Adrian Klaver wrote: > On 08/15/2018 12:31 PM, Don Seiler wrote: > >> PostgreSQL 9.6.6 on CentOS. >> >> We have a report query that has gone from maybe a few seconds to run to a >> few minutes to run since mid-July. Looking at the output of EXPLAIN >> ANALYZE, the row count estimates are way off, even though this table was >> just analyzed a day or so ago. What's more bizarre to me is that the row >> count esimate is *always* 75 for every node of the plan, where the actual >> rows is in the hundreds or thousands. This table is one of the busiest >> tables in our production database (many inserts and updates). It is >> autovacuumed and autoanalyzed a few times per week, although I'm looking to >> change it to a nightly manual schedule to avoid daytime autovacuums. >> >> Hash Join (cost=1869142.34..1869146.15 rows=75 width=88) (actual >> time=179877.869..179878.011 rows=759 loops=1) >> Hash Cond: (stores.pkey = lt.store_pkey) >> Buffers: shared hit=1654593 read=331897 dirtied=249 >> -> Seq Scan on stores (cost=0.00..2.77 rows=77 width=22) (actual >> time=0.007..0.023 rows=78 loops=1) >> Buffers: shared hit=2 >> -> Hash (cost=1869141.40..1869141.40 rows=75 width=50) (actual >> time=179877.847..179877.847 rows=759 loops=1) >> Buckets: 1024 Batches: 1 Memory Usage: 73kB >> Buffers: shared hit=1654591 read=331897 dirtied=249 >> -> Subquery Scan on lt (cost=1869138.59..1869141.40 rows=75 >> width=50) (actual time=179875.976..179877.697 rows=759 loops=1) >> Buffers: shared hit=1654591 read=331897 dirtied=249 >> -> GroupAggregate (cost=1869138.59..1869140.65 rows=75 >> width=50) (actual time=179875.976..179877.606 rows=759 loops=1) >> Group Key: lts.store_pkey, lts.owner, >> (date_trunc('minute'::text, lts.date_gifted)) >> Filter: (count(*) IS NOT NULL) >> Buffers: shared hit=1654591 read=331897 dirtied=249 >> -> Sort (cost=1869138.59..1869138.78 rows=75 >> width=42) (actual time=179875.961..179876.470 rows=6731 loops=1) >> Sort Key: lts.store_pkey, >> lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted)) >> Sort Method: quicksort Memory: 757kB >> Buffers: shared hit=1654591 read=331897 >> dirtied=249 >> -> Index Scan using gifts_date_added on >> gifts lts (cost=0.56..1869136.25 rows=75 width=42) (actual >> time=190.657..179870.165 rows=6731 loops=1) >> Index Cond: ((date_added > '2018-07-14 >> 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13 >> 14:14:21'::timestamp without time zone)) >> Filter: ((date_gifted >= '2018-08-13 >> 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13 >> 14:14:21'::timestamp without time zone)) >> Rows Removed by Filter: 938197 >> Buffers: shared hit=1654591 read=331897 >> dirtied=249 >> Planning time: 0.426 ms >> Execution time: 179893.894 ms >> >> I don't have a version of this query from prior to this summer, but >> getting explain plan for older data from older sand
Re: Odd Row Estimates in Query Plan (rows=75)
On Wed, Aug 15, 2018 at 3:31 PM, Adrian Klaver wrote: > > lts.date_added > '2017-07-14 11:13:05' > > and > > lts.date_gifted >= '2017-08-13 11:13:05' > ? > > In other words one '>' and the other '>=' ? > The date_added filters were added just to use that index and with a broad range, since there isn't a filter on date_gifted. You'll notice the date_added range is 30 days but the date_gifted range is 3 hours. We really only care about date_gifted but at this time there isn't an index on that field. Even as I experiment with some query rewrites, the EXPLAIN ANALYZE always says rows=75. I'm *very* curious to see why it is using that value. Don. -- Don Seiler www.seiler.us
Re: Replication question
> > I thought I read somewhere that in 9.6, as long as the WAL log is > available on disk or in the archive the replication server will provide > that to the replication client, and my archive NFS mount didn’t have to be > available to all replication clients. > Streaming replication will only read from the WAL files in the $PGDATA/pg_xlog directory. It will not read from archives. So, yes, you would need your NFS mount on the replica (or otherwise copy the archive files to the replica). Don. -- Don Seiler www.seiler.us
Not Null Constraint vs Query Planning
If I have a NOT NULL constraint on a column, and then run a query where that column IS NULL, does the optimizer "short-circuit" the query to return 0 rows right away? If so, is there a way to see that it is doing so? I've been running a few explain plans this morning and they all look the same. Here I create a table in PG10 with 10 million dummy rows, no indexes, vacuum/analyzed. I then query before and after and don't notice much difference. Wondering if there's any way to see an indication that the constraint was used in the query planning. My sample runs are at the end of this email. The REAL reason for this is that I'm wondering if I created a NOT NULL check constraint with "NOT VALID" would that then NOT be considered in such a "short-circuit" case until I ran the VALIDATE CONSTRAINT on it? Perhaps I should have just asked this in its own thread but I started diving into the query plan thing. postgres=# explain (analyze, buffers) select name from people where created_at is null; QUERY PLAN Seq Scan on people (cost=0.00..203093.21 rows=1 width=33) (actual time=5365.886..5365.886 rows=0 loops=1) Filter: (created_at IS NULL) Rows Removed by Filter: 1000 Buffers: shared hit=12828 read=90265 Planning time: 2.558 ms Execution time: 5379.862 ms (6 rows) postgres=# alter table people alter column created_at set not null; ALTER TABLE postgres=# vacuum analyze people; VACUUM postgres=# explain (analyze, buffers) select name from people where created_at is null; QUERY PLAN Seq Scan on people (cost=0.00..203092.49 rows=1 width=33) (actual time=2339.254..2339.254 rows=0 loops=1) Filter: (created_at IS NULL) Rows Removed by Filter: 1000 Buffers: shared hit=12938 read=90155 Planning time: 0.390 ms Execution time: 2339.274 ms (6 rows) -- Don Seiler www.seiler.us
Re: Not Null Constraint vs Query Planning
On Mon, Mar 2, 2020, 12:30 Vik Fearing wrote: > On 02/03/2020 18:09, Don Seiler wrote: > > The REAL reason for this is that I'm wondering if I created a NOT NULL > > check constraint with "NOT VALID" would that then NOT be considered in > such > > a "short-circuit" case until I ran the VALIDATE CONSTRAINT on it? > Perhaps I > > should have just asked this in its own thread but I started diving into > the > > query plan thing. > > You cannot do this because NOT NULL isn't a real constraint (meaning it > does not appear in pg_constraint). There have been several attempts to > make it a real constraint over the years but so far nothing has come of > them Using the check constraint method seemed to allow for the "not valid" step. I'm curious what the difference is between a NOT NULL check constraint versus setting the column to NOT NULL (assuming both are validated). Don.
Mixed Locales and Upgrading
Good morning, I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12. Normally I'd just plan to pg_upgrade the lot and be good to go. However I've found that quite a few (including our biggest/busiest database) have mixed locales. In the case of the biggest/busiest database, the cluster was created with locale en_US (NOT en_US.UTF-8), and so the databases have encoding LATIN1. However this database has encoding UTF8 while still having ctype and collation of en_US. I've since found that when this was last upgraded, they ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks when trying to restore this since UTF8 isn't supported in en_US for the CREATE DATABASE command used during pg_restore: command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql --port 50432 --username postgres --create --exit-on-error --verbose --dbname template1 "pg_upgrade_dump_16385.custom" >> "pg_upgrade_dump_16385.log" 2>&1 pg_restore: connecting to database for restore pg_restore: creating DATABASE "test" pg_restore: while PROCESSING TOC: pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres pg_restore: error: could not execute query: ERROR: encoding "UTF8" does not match locale "en_US" DETAIL: The chosen LC_CTYPE setting requires encoding "LATIN1". Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US'; Is there a way around this while still using pg_upgrade? My understanding is that a full dump and restore into a new DB with everything set to en_US.UTF-8 is the only to convert these (if I have to dump/restore, I wouldn't want to keep the mixed environment). Even with parallel jobs, I imagine that's a bit of downtime but I'll have to wait until I can get a copy of prod data to test with to be sure. Is logical replication an option here? Either maintaining the mixed environment or converting everything to en_US.UTF-8? I'm relatively new in this shop but I'm told they didn't mean to use en_US and there's no reason they wouldn't want to just use the standard/default UTF-8. Thanks, Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Mon, Mar 16, 2020 at 10:28 AM Tom Lane wrote: > > Egad. > My thoughts exactly. > Well, in principle you could likewise manually update pg_database's > datcollate and datctype columns to say "en_US.utf8". However, there's > a much bigger problem here --- what steps if any did this cowboy take > to ensure that the data inside the database was valid UTF8? > No steps that I've seen from the chat history I've been able to search. I'm not sure if there was an (invalid) assumption that LATIN1 is a subset of UTF-8 or if it was done in a panic to get the import/update working years ago. > I don't think you should use pg_upgrade here at all. A dump/restore > is really the only way to make sure that you have validly encoded data. > That is what I thought, and probably not what they'll want to hear given the downtime involved. Even with parallel dump/restore jobs, I imagine it will take quite a while (this first DB is almost 900GB). > However, if it's only one database out of a bunch, you could do something > like > > * pg_dump that one database; > * drop said database; > * pg_upgrade everything else; > * restore that one database from dump. > In the case of this busy cluster, the layout is like this: postgres=# \l List of databases Name| Owner | Encoding | Collate | Ctype| Access privileges ---+--+--+++--- f_all | postgres | UTF8 | en_US | en_US | postgres | postgres | LATIN1 | en_US | en_US | =Tc/postgres + | | ||| postgres=CTc/postgres template0 | postgres | LATIN1 | en_US | en_US | =c/postgres + | | ||| postgres=CTc/postgres template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 | (4 rows) So the template1 DB was dropped and recreated with the collate and ctype as well and then set to be a template again. But I believe that was well after f_all was changed (and so probably no need for the template1 change). In this case, if this is the only DB in the cluster, would it make sense to just create a new one as en_US.UTF-8 and then restore a dump of f_all into a pre-created en_US.UTF-8 DB? We have a few other mixed environments similar to this as well. Some have postgres and both template DBs with this same UTF8/en_US/en_US configuration. Is logical replication an option here? If the target DB were setup as en_US.UTF-8 across the board, would logical replication safely replicate and convert the data until we could then cut over? Thanks, Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Tue, Mar 17, 2020 at 8:06 AM Don Seiler wrote: > On Mon, Mar 16, 2020 at 10:28 AM Tom Lane wrote: > >> > Well, in principle you could likewise manually update pg_database's >> datcollate and datctype columns to say "en_US.utf8". However, there's >> a much bigger problem here --- what steps if any did this cowboy take >> to ensure that the data inside the database was valid UTF8? > > Is there a way to programmatically check for data that might be a problem now? -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Tue, Mar 17, 2020 at 8:56 AM Tom Lane wrote: > > Yikes. Well, if there aren't obvious operational problems, it might be > that the data is actually UTF8-clean, or almost entirely so. Maybe you > could look at the problem as being one of validation. In that case, > it'd be possible to consider not taking the production DB down, but just > doing a pg_dump from it and seeing if you can restore somewhere else. > If not, fix the broken data; repeat till clean. After that you could > do pg_upgrade with a clear conscience. I think you'll still end up > manually fixing the inconsistent datcollate/datctype settings though. > For this test, would we restore into an en_US.UTF-8/UTF8 database? Then, assuming no errors (or fixing any errors until clean), we change the datcollate/datctype settings in prod and proceed with pg_upgrade (obviously after testing all of that heavily)? What are the ramifications of changing collation like that? Should we consider rebuilding indexes ASAP after that? Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Tue, Mar 17, 2020 at 9:25 PM Michael Paquier wrote: > > There is no way to know how much indexes would get broken without > having a look at it. Anything ASCII-based should be of no problem. > If you have a doubt, reindexing evey index which includes text column > data is the best course of action in my opinion if you have any > doubts, because that's safe even if it has a higher cost. > Here's the fun part. A lot of the tables use UUIDv4 strings for primary keys. However these are stored in text/varchar columns. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Sun, Mar 22, 2020 at 4:48 PM Don Seiler wrote: > > Here's the fun part. A lot of the tables use UUIDv4 strings for primary > keys. However these are stored in text/varchar columns. > Actually, would I need to re-index on text columns that we know contain UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII characters. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Mon, Mar 30, 2020 at 4:30 PM Tom Lane wrote: > Don Seiler writes: > > Actually, would I need to re-index on text columns that we know contain > > UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII > > characters. > > I think you're all right with respect to those, since they're the > same under any encoding. It's columns containing non-ASCII characters > that you'd want to worry about reindexing. > That's what I was hoping to hear. Thanks! Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Mon, Mar 30, 2020 at 4:39 PM Don Seiler wrote: > On Mon, Mar 30, 2020 at 4:30 PM Tom Lane wrote: > >> Don Seiler writes: >> > Actually, would I need to re-index on text columns that we know contain >> > UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII >> > characters. >> >> I think you're all right with respect to those, since they're the >> same under any encoding. It's columns containing non-ASCII characters >> that you'd want to worry about reindexing. > > Follow-up question, the locale setting on the host would still be set to en_US (as would the postgres and template0 databases). Should I look to change that locale on the system to en_US.UTF-8, or even just for the postgres user that the DB cluster runs as? What are the ramification for doing (or not doing) so? Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
On Tue, Apr 7, 2020 at 11:41 AM Don Seiler wrote: > > Follow-up question, the locale setting on the host would still be set to > en_US (as would the postgres and template0 databases). Should I look to > change that locale on the system to en_US.UTF-8, or even just for the > postgres user that the DB cluster runs as? What are the ramification for > doing (or not doing) so? > One more question around the GUC settings for locale. It is currently set to this: # select name,setting from pg_settings where name like 'lc%'; name | setting -+- lc_collate | en_US lc_ctype| en_US lc_messages | lc_monetary | C lc_numeric | C lc_time | C Since I'm not changing the postgres or template0 databases (leaving those as en_US/LATIN1), do I keep lc_collate/lc_ctype as en_US? It's just the template1 and application database that I've set to en_US.UTF-8. I'm also struggling to see how lc_messages is an empty string. It is commented out in postgresql.conf but suggests 'C' will be the default. The OS locale LC_MESSAGES is set to en_US on the primary but I also see it is set to en_US on the newer replica hosts. What value would be used for lc_messages? I'm trying to create an empty DB with these same settings but if I omit --lc-messages it uses the OS locale value, and I can't set it to an empty string. Don. -- Don Seiler www.seiler.us
template0 needing vacuum freeze?
PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both over 50% towards TXID wraparound. I could vacuum template1 but couldn't vacuum template0 without first allowing connections. This is what it looked like before: # SELECT datname , age(datfrozenxid) , current_setting('autovacuum_freeze_max_age') FROM pg_database ORDER BY 2 DESC; datname |age | current_setting ++- foo_db | 1022106099 | 2 template0 | 1000278345 | 2 postgres | 643729 | 2 template1 | 643729 | 2 (4 rows) I've since allowed connections and ran "vacuumdb --freeze" on it and then immediately disabled the connections to it again. But I'm curious how template0 would be growing in age like this. Even now I see the template0 age growing. I can say that these DB has previously been altered for locale changes as well. I'm also running a long "vacuum freeze" on foo_db that will take a few days after seeing that autovacuum on a big table had been running on it since Feb 2 and making no progress, with over 850M dead tuples according to pg_stat_all_tables. I estimate 3-4 more days to go on that one. Once that's done I'll be scheduling manual vacuum jobs. Just wondering if that would somehow affect regular template0 cleanup though. I don't see anything in postgres log related to template0 other than my manual interactions today. -- Don Seiler www.seiler.us
Re: template0 needing vacuum freeze?
On Sat, May 16, 2020 at 12:44 PM Tom Lane wrote: > > So it's unsurprising that the freeze age increases until autovacuum > decides to do something about it. I'm suspicious that your alert settings are too aggressive and are notifying you before autovacuum kicks in. > You should *not* have had to do anything manual about this, unless you > have frobbed your autovac settings to the point of brokenness. > Shouldn't autovacuum have kicked in when the age of a table reaches 200M (our autovacuum_freeze_max_age is left at that default)? I see other tables in our app DB triggering the autovacuum "to prevent wrap-around" when they reach 200M. That's what had me concerned to see template0 with an age over 1B and no autovacuum even trying to clean up for it. Don. -- Don Seiler www.seiler.us
Re: template0 needing vacuum freeze?
On Mon, May 18, 2020 at 1:40 AM Laurenz Albe wrote: > > Did you see any weird messages when you vacuumed "template0"? > No. > Did "datfrozenxid" shrink after the operation? > > "foo_db" seems to be the bigger problem. Perhaps autovacuum never handled "template0" because it concluded (rightly) > that > it has to deal with "foo_db" first. > Yes this DB had a table in it that had been autovacuuming since Feb 2. It's age is half way to wraparound so I'm in the middle of a manual VACUUM FREEZE on it. I'd be interested in knowing if that prevents template0 from autovacuuming itself. There are no other autovacuum jobs running. > > > I can say that these DB > > has previously been altered for locale changes as well. > > Would you care to explain that? You changed "template0"? How? > It was changed before my time here to change the encoding from LATIN1 to UTF manually. One of the fun treats I'm working to correct as I also prepare these for upgrade to PG12. What are your non-default autovacuum settings? Perhaps you should speed up > autovacuum > by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by > increasing > "maintenance_work_mem". > All autovacuum settings on this DB are default. Cost delay is at the default 20ms. maintenance_work_mem I've already increased to 512MB (this VM has 8GB RAM). -- Don Seiler www.seiler.us
Re: template0 needing vacuum freeze?
On Mon, May 18, 2020 at 8:51 PM Tom Lane wrote: > > Do you have an idea why autovac was failing to clear the issue on that one > problem table, though? > Before I intervened, the maintenance_work_mem was only 16MB (they had encoded and carried over PG 8 defaults in the chef recipe). I bumped it to 512MB before kicking off my freezeThat's the big factor I can think of. This is a huge table obviously as well. By the time it caught my attention there were 850M dead tuples to be cleaned up. My VACUUM FREEZE has been running just just about 5 days and is probably half way done. We shouldn't be at risk of hitting wraparound though (only 52% there). Don. -- Don Seiler www.seiler.us
Re: Mixed Locales and Upgrading
Good morning, Back with a follow-up question to all this. I'm wondering if we shouldn't also change the locale settings for postgres/template0/template1 to match our new desires setting en_US.UTF-8 with UTF8 encoding. We haven't written anything to postgres. Some of our DB clusters have changed these already (but still leaving things mixed) but I want to make things uniform for future upgrades to go a lot more smoothly. Some examples of current DB clusters that need to be fixed. The first one has the more problematic incompatible mixing within the DBs (the original point of this thread), the others just have different settings across DBs that I'd like change going forward as well. datname | encoding | datcollate | datctype | size_mb +--++--+- xx | UTF8 | en_US | en_US| 1390789 postgres | UTF8 | en_US | en_US| 6 template0 | UTF8 | en_US | en_US| 6 template1 | UTF8 | en_US | en_US| 6 datname| encoding | datcollate | datctype | size_mb --+--+++- xxx | UTF8 | en_US.utf8 | en_US.utf8 |2178 postgres | LATIN1 | en_US | en_US | 7 template0| LATIN1 | en_US | en_US | 7 template1| UTF8 | en_US.utf8 | en_US.utf8 | 7 datname | encoding | datcollate | datctype | size_mb ---+--+++- xx| UTF8 | en_US.utf8 | en_US.utf8 | 345 postgres | LATIN1 | en_US | en_US | 7 template0 | UTF8 | en_US.utf8 | en_US.utf8 | 7 template1 | UTF8 | en_US.utf8 | en_US.utf8 | 7 For the smaller DBs I was planning to just dump/restore into a new cluster as I upgrade to PG12 as well. However two that have the problem are the two biggest ones where the downtime for dump/restore would be too great. So I'm wondering if there is risk or harm in running an UPDATE pg_database command on postgres/template0/template1 as needed and re-indexing afterward. -- Don Seiler www.seiler.us
autovacuum on primary blocking queries on replica?
- PostgreSQL 12.9 - PGDG Ubuntu 18.04 image - Streaming physical replication - hot_standby_feedback = on We use a read replica to offload a lot of (what should be) quick queries. This morning we had an incident where these queries were all blocking on AccessShareLock waits, written to the log as: 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp LOG: process 8185 still waiting for AccessShareLock on relation 16834 of database 16401 after 1000.228 ms at character 204 2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp DETAIL: Process holding the lock: 10822. Wait queue: 32373, 8185, 13782, 14290, 13215, 4427, 16056, 13446, 13448, 13445, 9268, 11784, 14469 , 14502, 14340, 6510, 8730, 10876, 13012, 15760, 6645, 14712, 13006, 15923, 14485, 15924, 13447, 9926, 13005, 11803, 13013, 13298, 16059, 9266, 4943, 14470, 13148, 12118, 14491, 9927, 11791, 15097, 11804, 16060, 14711, 5469, 8113, 16061, 14698, 14475, 10620, 13150, 14501, 14465, 14341, 16062, 16065, 14484, 7060, 11785, 16068, 16063, 16064, 16066. This went on for 30 seconds (the value of max_standby_streaming_delay) until PG killed the blocking process: 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp FATAL: terminating connection due to conflict with recovery 2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp DETAIL: User was holding a relation lock for too long. I'm trying to find the root cause of why this started. We did see an UPDATE that was executed at 2022-05-27 15:23:37.000 UTC on the table in question (relation 16384) that ran against all rows of the table (only ~8200 rows, not huge) but the transaction was rolled back (due to a syntax error later in the transaction). 15 seconds later we then see an aggressive autovacuum on this table: 2022-05-27 15:23:52.507 UTC [30513] LOG: automatic aggressive vacuum of table "foo_all.public.industry": index scans: 1 pages: 252 removed, 323 remain, 0 skipped due to pins, 0 skipped frozen tuples: 8252 removed, 8252 remain, 0 are dead but not yet removable, oldest xmin: 1670999292 buffer usage: 12219 hits, 137 misses, 54 dirtied avg read rate: 2.372 MB/s, avg write rate: 0.935 MB/s system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.45 s and less than a second after that is when we see the first AccessShareLock message on the replica. I've been reading tales of autovacuum taking an AccessExclusiveLock when truncating empty pages at the end of a table. I'm imagining that updating every row of a table and then rolling back would leave all of those rows empty at the end and qualify for truncation and lead to the scenario I saw this morning. I'm still not entirely satisfied since that table in question was so small (only 8252 rows) so I wouldn't imagine it would hold things up as long as it did. Although the blocking session on the replica was an application session, not any background/recovery process. I'm wondering if I'm on the right trail or if there is a much firmer explanation for what happened. Thanks, Don. -- Don Seiler www.seiler.us
Re: Compile and build portable postgresql for mac
On Wed, Oct 24, 2018 at 7:22 PM Pratik Parikh wrote: > How to compile and build portable postgresql for mac. Can someone > point me how the builds are done for general distribution. A script would > be appreciated if anyone has it. > I'm not sure what you mean by "portable"? I'm going to answer assuming you just want to run postgres on your own Mac. There are couple options. Peter Eisentraut has a great collection of Homebrew scripts that you can plug into: https://github.com/petere/homebrew-postgresql I've recently switched to pgenv, which provides a clean CLI for building and using all the various supported versions: https://github.com/theory/pgenv Don. -- Don Seiler www.seiler.us
Re: Postgres 11 chooses seq scan instead of index-only scan
On Thu, Jan 24, 2019 at 9:01 AM twoflower wrote: > Yes! That was it, after running VACUUM TABLE history_translation, the > query is now executed using index-only scan. > > I was under the impression that ANALYZE TABLE history_translation is > enough, but it is not. > Only a VACUUM will update the visibility map. https://www.postgresql.org/docs/current/storage-vm.html I used to think the same, that ANALYZE was enough, coming from an Oracle background. I learned later that the visibility map isn't just used to determine what to vacuum, but it is used by the optimizer/planner when evaluating execution plans. -- Don Seiler www.seiler.us
Native Logical Replication Initial Import Qs
Good afternoon. I'm looking at having to move a fleet of PG 12 databases from Ubuntu 18.04 to Ubuntu 22.04. This means crossing the dreaded libc collation change, so we're looking to have to migrate via pg_dump/restore or logical replication for the bigger/busier ones. We're also planning to use PG 15 on the destination (Ubuntu 22.04) side to kill two birds with one stone, as much as I'd prefer to have minimal moving parts. On the logical replication front, the concern is with the initial data import that happens when the subscription is created (by default). I know that you can tell the subscription to not copy data and instead use pg_dump and a replication slot snapshot to achieve this manually. However I'm unable to explain (to myself) why this is better than just having the subscription do it upon creation. Given that I can create pub/sub sets for individual tables for parallel operations, I'm curious what advantages there are in using pg_dump to do this import. I had been planning to have pg_dump pipe directly into the destination database via psql. Is this faster than just having the subscription do the import? I'm curious as to why or not. I know to only use the minimal indexes required on the destination side (ie identity-related indexes) and omit other indexes and constraints until after the data is loaded, but that is true for either method. Thanks, Don. -- Don Seiler www.seiler.us
Re: Native Logical Replication Initial Import Qs
On Wed, Jun 7, 2023 at 4:30 PM Jeremy Schneider wrote: > On 6/7/23 2:12 PM, Don Seiler wrote: > > On the logical replication front, the concern is with the initial data > > import that happens when the subscription is created (by default). I > > know that you can tell the subscription to not copy data and instead use > > pg_dump and a replication slot snapshot to achieve this manually. > > However I'm unable to explain (to myself) why this is better than just > > having the subscription do it upon creation. Given that I can create > > pub/sub sets for individual tables for parallel operations, I'm curious > > what advantages there are in using pg_dump to do this import. > > FWIW, I think the place this feature shines the most is when you can > safely leverage things like storage-level snapshots. Sometimes that > means you can get a copy of a multi-TB database almost instantly if the > storage or filesystem does copy-on-write, for example database lab > (postgres.ai) which uses ZFS. > > Another thing I can think of is that while it's true you can create > multiple pub/sub sets, I'm not sure you can reduce the number of sets > later. So if you were concerned about having too many slots doing > decoding on the source, then you might want the flexibility of pg_dump > (or perhaps restoring a backup) to get more parallelism while having > more control over how many slots will be used later. > > In your case, the whole setup is hopefully temporary, so maybe these > particular concerns aren't as relevant to you. > > This is just what comes to mind... probably there's a few more things > I'm not thinking of and hopefully others will chime in. :) > Yes my setup would be temporary, just as long as needed to complete the migration to a new host. One other use case is using pg_dump/restore to move older data in time-based partitions (assuming data isn't modified for older partition periods). Those older partitions would just need to be copied but wouldn't need any ongoing pub/sub/replication set up for them. So you'd have both in use in that case. Thanks, Don. -- Don Seiler www.seiler.us
Runaway Initial Table Syncs in Logical Replication?
Logical Rep question. Publisher is PG12 on Ubuntu 18.04, subscriber is PG15 on Ubuntu 22.04. I bumped up some values to see how initial load times change. I set max_logical_replication_workers to 20 and max_sync_workers_per_subscription to 4. I’m using 3 subscriptions, 2 of the subscriptions have 3 tables each, the other has a lot more, say 100 for the sake of example.What we noticed was that the subscriber basically maxed out the wal senders and replication slots on the publisher side, even when the publisher settings were bumped up to 50 each. 3 replication slots were for the 3 subs and the rest (47) were sync workers. Was creating a sync worker for each table right away, or at least trying to? The subscriber side was still complaining that it couldn’t create more replication slots on the publisher. I was expecting to see max_sync_workers_per_subscription (4) x # of subs (3) = 12 sync workers in action so this was a big surprise. Is this expected behavior? -- Don Seiler www.seiler.us
Re: Runaway Initial Table Syncs in Logical Replication?
: session time: 0:00:00.009 user=migrator datab ase=foo host=10.228.88.74 port=17390 and then this from a different pid, maybe some kind of cleanup process? It was doing this for what seemed like all of the sync slots: 2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 ERROR: replication slot "pg_19742_sync_17238_7263122209699118815" does not exist 2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 STATEMENT: DROP_REPLICATION_SLOT pg_19742_sync_17238_7263122209699118815 WAIT -- Don Seiler www.seiler.us
Re: Calculating vm.nr_hugepages
On Wed, Aug 30, 2023 at 8:12 AM Troels Arvin wrote: > Hello, > > I'm writing an Ansible play which is to set the correct value for > vm.nr_hugepages on Linux servers where I hope to make Postgres make use > of huge pages. > > However, I'm struggling to find the right formula. > > I assume I need to find the same value as I get from running "postgres > -C shared_memory_size_in_huge_pages". I call that my target value. > Note: I cannot simply run "postgres -C ...", because I need my Ansible > play to work against a server where Postgres is running. > > I've tried using the formula described at > https://www.cybertec-postgresql.com/en/huge-pages-postgresql/, but it > produces a different value than my target: > > Using a shared_buffers value of 21965570048, like in Cybertec > Postgresql's example: > "postgres ... -C 21965570048B" yields: 10719 > The formula from Cybertec Postgresql says: 10475 > > I've also tried doing what ChatGPG suggested: > Number of Huge Pages when shared_buffers is set to 1 GiB = > shared_buffers / huge_page_size > = 1073741824 bytes / 2097152 bytes > = 512 > But that's also wrong compared to "postgres -C ..." (which said 542). > > Which formula can I use? It's OK for me for it to be slightly wrong > compared to "postgres -C", but if it's wrong, it needs to be slightly > higher than what "postgres -C" outputs, so that I'm sure there's enough > huge pages for Postgres to be able to use them properly. > Good morning Troels, I had a similar thread a couple of years ago, you may want to read: https://www.postgresql.org/message-id/flat/CAHJZqBBLHFNs6it-fcJ6LEUXeC5t73soR3h50zUSFpg7894qfQ%40mail.gmail.com In it, Justin Przby provides the detailed code for exactly what factors into HugePages, if you require that level of precision. I hadn't seen that Cybertec blog post before. I ended up using my own equation that I derived in that thread after Justin shared his info. The chef/ruby code involved is: padding = 100 if shared_buffers_size > 4 padding = 500 end shared_buffers_usage = shared_buffers_size + 200 + (25 * shared_buffers_size / 1024) max_connections_usage = (max_connections - 100) / 20 wal_buffers_usage = (wal_buffers_size - 16) / 2 vm.nr_hugepages = ((shared_buffers_usage + max_connections_usage + wal_buffers_usage + padding) / 2).ceil() wal_buffers_size is usually 16MB so wal_buffers_usage ends up being zeroed out. This has worked out for our various postgres VM sizes. There's obviously going to be a little extra HugePages that goes unused, but these VMs are dedicated for postgresql usage and shared_buffers_size defaults to 25% of VM memory so there's still plenty to spare. But we use this so we can configure vm.nr_hugepages at deployment time via Chef. Don. -- Don Seiler www.seiler.us
Logical Replication vs. Free Replication Slots
Good morning, I'm trying to set up native logical replication from PG 12.15 (Ubuntu 18.04) to 15.4 (Ubuntu 22.04). I wanted to set up 3 publications to split up the work. I initially created the three publications with a handful of tables each. On the source (PG 12.15) instance, we have bumped max_replication_slots and max_wal_senders to 50, and max_sync_workers_per_subscription to 10. When I create the subscriptions to start the initial table sync, the third one failed to enable, with this error: 2023-09-29 21:07:45.861 UTC [1853362] ERROR: could not find free replication state slot for replication origin with ID 11 2023-09-29 21:07:45.861 UTC [1853362] HINT: Increase max_replication_slots and try again. 2023-09-29 21:07:45.861 UTC [1853362] LOG: subscription "sub03" has been disabled because of an error However when I look at the publication side, there are only 20 replication slots created (out of 50 max), 3 of which are the subscriptions and the rest are the tablesyncs workers. So I'm trying to make sense of why it would tell me to increase max_replication_slots when I don't appear to be anywhere near the max. -- Don Seiler www.seiler.us
Re: Logical Replication vs. Free Replication Slots
On Tue, Oct 3, 2023 at 10:27 AM Don Seiler wrote: > On the source (PG 12.15) instance, we have bumped max_replication_slots > and max_wal_senders to 50, and max_sync_workers_per_subscription to 10. > Forgot to note that on the subscriber (PG 15.4) instance, max_sync_workers_per_subscription is 4, and max_logical_replication_workers is 20. Per the docs, max_sync_workers_per_subscription would have no effect on the publisher side. Don. -- Don Seiler www.seiler.us
malloc errors in psql with libpq 16.1 on Mac
I have a colleague trying to use psql to connect to a remote DB, but he's getting this error: FATAL: no PostgreSQL user name specified in startup packet psql(42705,0x7ff84a07b700) malloc: *** error for object 0xa6: pointer being freed was not allocated psql(42705,0x7ff84a07b700) malloc: *** set a breakpoint in malloc_error_break to debug He was using psql 14 and we upgraded him to psql 15 but no change. I noticed he was on libpq 16.1 (installed via homebrew). Another colleague who was able to connect fine was on libpq 15.3. He upgraded to libpq 16.1 and boom he now gets the same error. One colleague is on an older amd64 Mac, the other is on a newer arm64 Mac. They are working to try to pin the older version of libpq but this is probably going to bust out to a larger group as other homebrew users run their routine updates. Just wondering if anyone else has had similar experiences or if this is a known issue? -- Don Seiler www.seiler.us
Re: malloc errors in psql with libpq 16.1 on Mac
On Mon, Nov 27, 2023 at 6:02 PM Tom Lane wrote: > > Is OpenSSL in use? If so, what version? I'm wondering if this is a > variant of the OpenSSL 3.2.0 compatibility problem we just heard of: > My colleague does confirm it is OpenSSL 3.2.0: ==> openssl@3: stable 3.2.0 (bottled) Does this mean that downgrading libpq wouldn't help? He'd have to downgrade openssl instead (or wait for a fix from somewhere upstream)? Don. -- Don Seiler www.seiler.us
Use of max_slot_wal_keep_size parameter
Good morning, With our recent upgrade to PG 15, we've put the max_slot_wal_keep_size into place, setting it about 3% lower than the size of the volume dedicated to pg_wal. However last night we had an incident where the volume filled up while we were performing a massive insert from one table into another. I don't believe the replica was lagging, and we have pgbackrest configured for async archiving, which I assume was fine. It seemed to just be a matter of PG not cleaning up the WAL. Our understanding was that max_slot_wal_keep_size would prevent this but perhaps it only deals with situations where the replication slot lag is a factor? Don. -- Don Seiler www.seiler.us
Re: Use of max_slot_wal_keep_size parameter
On Tue, Mar 26, 2024 at 9:09 AM Tom Lane wrote: > My immediate reaction is that 3% is a mighty small margin for error. > I don't know exactly how max_slot_wal_keep_size is enforced these > days, but in the past restrictions like that were implemented by > deciding during a checkpoint whether to unlink a no-longer-needed WAL > file (if we had too much WAL) or rename/recycle it to become a future > WAL segment (if not). So you could overshoot the specified target by > more or less the amount of WAL that could be emitted between two > checkpoints. Perhaps it's tighter nowadays, but I really doubt that > it's exact-to-the-kilobyte-at-all-times. > In this case, the total volume size was 60GB and we had the parameter set to 58GB but I imagine that can still be overwhelmed quickly. Maybe we should target a 20% buffer zone? We have wal_keep_size defaulted at 0. Thanks, Don. -- Don Seiler www.seiler.us
Dangerous Naming Confusion
Good evening, Please see my gist at https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 for a complete test case. I tested this on PG 12.6 and 13.2 and observed the same on both. We were expecting the queries that use dts_temp to only return 3 rows. However the subquery starting at line 36 returns ALL 250,000 rows from dts_orders. Note that the "order_id" field doesn't exist in the dts_temp table, so I'm assuming PG is using the "order_id" field from the dts_orders table. If I use explicit table references like in the query at line 48, then I get the error I would expect that the "order_id" column doesn't exist in dts_temp. When I use the actual column name "a" for dts_temp, then I get the 3 rows back as expected. I'm wondering if this is expected behavior that PG uses the dts_orders.order_id value in the subquery "select order_id from dts_temp" when dts_temp doesn't have its own order_id column. I would have expected an error that the column doesn't exist. Seems very counter-intuitive to think PG would use a column from a different table. This issue was discovered today when this logic was used in an UPDATE and ended up locking all rows in a 5M row table and brought many apps to a grinding halt. Thankfully it was caught and killed before it actually updated anything. Thanks, Don. -- Don Seiler www.seiler.us
Re: Dangerous Naming Confusion
On Mon, Mar 29, 2021 at 5:22 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Mon, Mar 29, 2021 at 3:20 PM Adrian Klaver > wrote: > >> On 3/29/21 3:00 PM, Don Seiler wrote: >> > >> > I'm wondering if this is expected behavior that PG uses the >> > dts_orders.order_id value in the subquery "select order_id from >> > dts_temp" when dts_temp doesn't have its own order_id column. I would >> > have expected an error that the column doesn't exist. Seems very >> > counter-intuitive to think PG would use a column from a different table. >> >> See: >> >> >> https://www.postgresql.org/message-id/pine.lnx.4.56.0308011345320@krusty.credativ.de >> >> > There is also an FAQ entry: > > > https://wiki.postgresql.org/wiki/FAQ#Why_doesn.27t_PostgreSQL_report_a_column_not_found_error_when_using_the_wrong_name_in_a_subquery.3F > > David J. > > This is good to know. I figured it might "working as expected". Still seems annoying/counter-intuitive to me but at least we know to look for it. Thanks, Don. -- Don Seiler www.seiler.us
Could not read block in file
This morning I got some errors from our pre-prod environment. This is running Pg 9.6.6 on CentOS 7.3. Just grepping out these errors (and obfuscating sensitive data): 2018-01-18 06:29:21 CST [11912]: [1570-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 39 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:29:42 CST [11959]: [517-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 40 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:30:09 CST [11912]: [1577-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 41 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:30:32 CST [11970]: [162-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 42 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:30:40 CST [11912]: [1583-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 43 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:31:04 CST [11970]: [168-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 44 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:31:31 CST [11959]: [595-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 45 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes All of these errors occurred during attempts to insert into this liferay table. For example: 2018-01-18 06:29:21 CST [11912]: [1568-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com LOG: execute S_14: INSERT INTO quartz_FIRED_TRIGGERS (ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE, INSTANCE_NAME, FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL, REQUESTS_RECOVERY, PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) 2018-01-18 06:29:21 CST [11912]: [1569-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com DETAIL: parameters: $1 = 'FOO', $2 = 'Bar CSVTrigger', $3 = 'DEFAULT', $4 = 'f', $5 = 'FOO', $6 = '123', $7 = 'BLAH', $8 = NULL, $9 = NULL, $10 = 'f', $11 = 'f', $12 = '5' 2018-01-18 06:29:21 CST [11912]: [1570-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com ERROR: could not read block 39 in file "pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes 2018-01-18 06:29:21 CST [11912]: [1571-1] db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client= abcchizapp1.mycompanyname.com STATEMENT: INSERT INTO quartz_FIRED_TRIGGERS (ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE, INSTANCE_NAME, FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL, REQUESTS_RECOVERY, PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) These are the only occurrences of these errors. I've since been able to query this table as well as take a pg_dump of the schemas on this tablespace, with no errors either time. So I can't seem to duplicate this problem. The app has since successfully performed many, many inserts into that table since these errors. I don't see any errors in /var/log/messages that would incidate any filesystem issues either. Obviously the worry is for possible corruption. Granted this is "only" pre-prod but there will be concerns from the business side prior to upgrading our prod DB (currently 9.2.22). What else can/should I check here to make sure there isn't something wrong with this database cluster? -- Don Seiler www.seiler.us
vacuumdb --all Parallel Feature Request
Good afternoon folks. I've been playing around with some vacuumdb options this week as part of post-upgrade testing, in particular with parallel (--jobs=N). I noticed that vacuumdb --all will only work on one database at a time. This means that as it winds down to the last few tables in a particular database, workers will be idle. For example, I ran with 8 jobs, but while 1 job runs on the last big table of a particular database, the other 7 are doing nothing. It would be great if those idle jobs would be able to move on to the next database to get a head start on the tables there. When you have big tables in multiple databases in the cluster, it can make a big difference. Also, running analyze_new_cluster.sh mentions this: If you would like default statistics as quickly as possible, cancel > this script and run: > "/usr/pgsql-9.6/bin/vacuumdb" --all --analyze-only I wonder if it wouldn't be a bad idea to also mention the --jobs=N parameter option in that blurb. Yes it's in the --help text but it wouldn't be bad to highlight its availability. Don. -- Don Seiler www.seiler.us
pgBackRest backup from standby
Evening all. Looking to use pgBackRest to take a backup from a hot standby. I'm reading that pgBackRest still needs to connect to the primary and copy some files. My questions are: 1. What files does it need to copy? Config files? WAL files? 2. How does it connect? SSH? 3. Does pgBackRest need to be installed and configured on the primary as well? Thanks, Don. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Sun, Feb 18, 2018 at 6:43 PM, Michael Paquier wrote: > > I am adding in CC: Stephen Frost and David Steele who work on the took. > I assumed Stephen was already on this list, and I communicate with him regularly on Slack as well but just throwing this out there on a Sunday night. > You may want to contact the maintainers directly through github where > the project is maintained: > https://github.com/pgbackrest/pgbackrest Is that the place to just ask questions? I wasn't sure that "Issues" meant just questions versus problems or bugs. I didn't see any mention of a forum or list on their website, and there have been lots of pgBackRest questions on this list in the past so I settled on this one. Don. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 8:18 AM, David Steele wrote: > It copies files that are not replicated from the primary so that a > primary-style backup is created. Anything that is replicated (which is > by far the bulk of the data) is copied from the standby. > OK so all data files would be copied from standby. Can you give me an example of the types of files that need to be copied from primary? > it's best to archive from the primary so a replication > failure does not affect your archiving. > Understood, just not something I can change in production primary at the moment. Hence looking to see about a quick one-off backup from standby. > Configuring pgBackRest, SSH, standby, and backup from standby are all > covered in the user guide. Thanks, I've been through it a few times and played with some test backups from primary clones. I just ditched my master/replica clone setup but I'll test there as well. I just had a couple questions about the mechanics. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 8:23 AM, David Steele wrote: > > Either is fine with me, but as Michael says I might miss postings to > -general. I'm sure somebody else would catch it, though. > OK, I'll make use of the issues tracker going forward. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 8:53 AM, David Steele wrote: > Anything *not* in global (except pg_control), base, pg_tblspc, > pg_xact/pg_clog, and pg_multixact are copied from the primary. > > For example, pg_stat is copied from the primary so these stats are > preserved on a standby backup. > So if I have tablespaces outside of $PGDATA (but symlinked from within pg_tblspc, of course), those will still be backed up from the standby, right? Is it right to say that the files that would be copied from primary are very small, typically? So it isn't a huge transfer over the WAN (in my case)? > pgBackRest uses all the same exclusions as pg_basebackup, so many > dirs/files are not copied at all: pg_dynshmem, pg_notify, pg_replslot, > pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans, etc. > > Full list here > https://www.postgresql.org/docs/10/static/protocol-replication.html. > > > it's best to archive from the primary so a replication > > failure does not affect your archiving. > > > > Understood, just not something I can change in production primary at the > > moment. Hence looking to see about a quick one-off backup from standby. > > For a quick one-off, pg_basebackup is your friend. One of the requirements of this backup is encryption, which I don't see any notes for with pg_basebackup. Also due to the size, parallel workers. pgBackRest gives me both of these. I need compression as well but that pg_basebackup does do. I did come up with a sort of Rube Goldberg-esque workaround for now involving using a clone of the prod standby VM from Veeam backup to use as the backup source (after stopping recovery and opening it as a standalone DB). Don. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 9:21 AM, David Steele wrote: > > Yes, they are typically very small. The general exception to this rule > is if logs are stored in pg_log. I recommend storing logs out of the > PGDATA dir as they can be quite large and don't really make sense to > restore to another server. > > Files copied from the master will be marked as such in backup.manifest > (master:true) so you can check for yourself. > Good to know. And fortunately for this DB we do have pg_log (and pg_xlog) symlinked to different volumes outside of $PGDATA. > I did come up with a sort of Rube Goldberg-esque workaround for now > > involving using a clone of the prod standby VM from Veeam backup to use > > as the backup source (after stopping recovery and opening it as a > > standalone DB). > > You don't get PITR that way, of course, but at least it's a backup. As > long as your clone is consistent. Yes it's a crash-consistent snapshot-based backup. I've done quite a few restores from it and it works great. It can do PITR as well since I would have all the WAL files from prod needed to keep recovering. But for these cases I just recover it to the first consistent point and open it for testing (or backups in this case). Thanks for all your help! Don. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 12:39 PM, David Steele wrote: > > > I read "open it for testing (or backups in this case)" as letting > recovery complete and promoting the cluster to a master before taking > the backup. > > Don, is that the case? If it is, I think there's a problem with or > without a timeline switch. If you confirm the backup is being taken as > above then I'll detail my concerns. > Note that this is just for creating a couple of one-off backups to restore for our dev and pre-prod environments. Given that, I was going to open a new clone as its own cluster and take backups from that. The data would be the same though and suit purposes of the dev and pre-prod refreshes. If I were taking backups for the purpose of production backups, I would not do things this way. That is the eventual plan but right now we aren't ready to make the changes necessary in the production environment. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 10:17 AM, David Steele wrote: > If pg_log is symlinked to PGDATA it will be copied. pg_xlog is not > copied in any backup. So an external pg_log directory symlinked into $PGDATA will have its log contents copied? I'm curious, why even copy the pg_log logs? They aren't needed for database restore or recovery. Don. -- Don Seiler www.seiler.us
Re: pgBackRest backup from standby
On Mon, Feb 19, 2018 at 2:53 PM, David Steele wrote: > > Also, relocating the log directory is easy using the log_directory > setting, so that's what I recommend if it's an issue. Some users do > want to backup their logs. > > That's probably a lot better idea than symlinking anyway. I'll look to do that in my next round of config changes. -- Don Seiler www.seiler.us
Re: Index Partition Size Double of its Table Partition?
On Wed, Oct 30, 2024 at 11:23 AM Peter Geoghegan wrote: > > If a substantial amount of the index was written by CREATE INDEX (and > not by retail inserts) then my theory is unlikely to be correct. It > could just be that you managed to absorb most inserts in one > partition, but not in the other. That's probably possible when there > are only relatively small differences in the number of inserts that > need to use of the space left behind by fillfactor in each case. In > general page splits tend to come in distinct "waves" after CREATE > INDEX is run. > What do you mean by "absorb" the inserts? It sounds like the answer will be "No", but: Would rebuilding the index after the month-end (when inserts have stopped on this partition) change anything? Don. -- Don Seiler www.seiler.us
Index Partition Size Double of its Table Partition?
We're trying out a new non-unique covering (including) index on a couple of table partitions. We put the index on partitions for last month and this month. Both table partitions have similar sizes (45-46 GB) and row counts (330-333 million). The covering index on last month's partition is 50GB, but this month's index is 79GB already. The table is basically write-only as well. So there shouldn't be any real bloat here. One thing worth mentioning is that the table is 4 columns, the index is on two of them and includes the other two. I can't think of an explanation for the index being so much larger than its table, especially compared to last month's index. Curious if anyone has any thoughts on what might be causing this. -- Don Seiler www.seiler.us
Re: Index Partition Size Double of its Table Partition?
On Wed, Oct 30, 2024 at 10:45 AM Peter Geoghegan wrote: > > It sounds like you have no updates and deletes. Right? So the only > thing that could be different is the way that the pages are being > split (aside from variations in the width of index tuples, which seems > highly unlikely to be the only factor). > Correct, the table only sees inserts as far as DML goes. > The heuristics used to trigger the relevant behavior are fairly > conservative. I wonder if we should be more aggressive about it. > > > I should have also included that this is on PG 15 (currently 15.8 but we > created the indexes when it was still 15.7) on Ubuntu 22.04 LTS. > > That shouldn't matter, as far as this theory of mine is concerned. > Anything after 12 could be affected by the issue I'm thinking of. > Why would last month's index be so much smaller? Both indexes were created using CONCURRENTLY, as each was created during its month when we started testing. The September index was created toward the end of the month (Sep 26), whereas the October one was created Oct 1. Both table partitions are getting regularly autovacuum/autoanalyze work. Don. -- Don Seiler www.seiler.us
Re: Index Partition Size Double of its Table Partition?
On Wed, Oct 30, 2024 at 10:35 AM Peter Geoghegan wrote: > On Wed, Oct 30, 2024 at 11:24 AM Don Seiler wrote: > > One thing worth mentioning is that the table is 4 columns, the index is > on two of them and includes the other two. I can't think of an explanation > for the index being so much larger than its table, especially compared to > last month's index. > > > > Curious if anyone has any thoughts on what might be causing this. > > You mentioned that this has 4 columns. Sounds like this could be a > "locally monotonically increasing index". I wonder if you're > benefiting from this optimization, though only inconsistently: > > https://www.youtube.com/watch?v=p5RaATILoiE&t=2079s > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f21668f3 > > Of course, this is only a guess. I vaguely recall a complaint that > sounded vaguely like yours, also involving partitioning. > Thanks Peter, I'll look into that shortly. I should have also included that this is on PG 15 (currently 15.8 but we created the indexes when it was still 15.7) on Ubuntu 22.04 LTS. Don. -- Don Seiler www.seiler.us
Re: Index Partition Size Double of its Table Partition?
On Wed, Oct 30, 2024 at 4:59 PM David Mullineux wrote: > Are you able to cluster the table ? The idea is that rows ordered in the > same way as the index might reduce it's size ? > I'm not sure on this. There are other indexes on these table partitions as well. Another bit of useful info that I should have shared immediately is that this is a monthly partitioned table, going back years. We don't drop old partitions (yet) on this one. For now we've only added this index to a few individual partitions. The hope was to add it to all of them and then eventually the template (using an older version of pg_partman). Don. -- Don Seiler www.seiler.us
Re: Index Partition Size Double of its Table Partition?
On Thu, Oct 31, 2024 at 11:13 AM Rob Sargent wrote: > Whats the fill factor? > No fill factor is specified, so I'm assuming it's the default 90% for indexes. FYI we did a REINDEX for the index in question tonight. Since the index was for last month, there are no more writes to it so we didn't use CONCURRENTLY either. The size went from 83GB to 48GB, which also happens to be the size of the table partition. -- Don Seiler www.seiler.us