from:"Don Seiler"

Archiving Data to Another DB?

2018-04-11 Thread Don Seiler

Let's say I have two DBs: main (9.6.6) and archive (10.2).

I have a table in main where I want to archive data older then 60 days. For
various reasons, the table is not partitioned, so for now we must use
DELETE. The destination table in the archive DB is partitioned with the new
Pg10 partitioning.

My initial plan was to have a stored procedure on the archive DB use
postgres_fdw to do an INSERT INTO / SELECT to select the data remotely and
insert into the local archive table. It would then issue a single DELETE
command to remotely remove the data from the main DB. However I found that
doing this resulted in the main DB calling thousands (perhaps millions if
it's one-per-row) of individual DELETE statements based on a ctid column.
Aside from WAL behavior concerns, it is flooding my postgresql server logs
since I log any DML.

On top of that, I'm told that a remote DELETE wouldn't be transactional, so
if I were to compare inserted rows vs deleted rows and found a mismatch, I
couldn't just rollback the DELETE. I plan to verify this with a small test
case later but for now I'll assume this to be true.

Right now I'm thinking of falling back to the far-less-elegant method of
dumping the data to a flat file via COPY, running psql to connect to the
archive DB remotely and running a COPY to load the data (or maybe
transferring the flat file to the archive DB to load it there, offloading
that part of the workload), then deleting the data from the main DB. I
could capture the rows dumped in a control table and compare the rows
deleted against that and then rollback the delete if necessary.

Like I said, not elegant, but I don't want to risk losing data that wasn't
successfully archived to the archive DB. I'm very interested to hear what
others might be doing for tasks like this.

Don.

-- 
Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

2018-04-11 Thread Don Seiler

On Wed, Apr 11, 2018 at 12:58 PM, Adrian Klaver 
wrote:
>
> "F.33.3. Transaction Management
>
> During a query that references any remote tables on a foreign server,
> postgres_fdw opens a transaction on the remote server if one is not already
> open corresponding to the current local transaction. The remote transaction
> is committed or aborted when the local transaction commits or aborts.
> Savepoints are similarly managed by creating corresponding remote
> savepoints.
>
> ..."

Interesting, I'll work on a test case later!

> I may be missing something, but why not reverse your original set up?
> Assuming transactional behavior works as expected something like:
>
> 1) Setup postgres_fdw in main database.
>
> 2) Create FOREIGN TABLE pointing to table in archive database.
>
> 3) INSERT INTO/SELECT from main table to archive table.
>
> 4) DELETE FROM main table.

I had considered this as well, as this would allow me to rollback the
delete (assuming my intel on postgres_fdw transactions was correct, which
it may not be after all). I wondered if a remote insert would be broken up
into individual inserts like the remote delete was, as that would be
equally unappealing for the same reasons. But obviously worth confirming.

Don.

-- 
Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

2018-04-11 Thread Don Seiler

On Wed, Apr 11, 2018 at 1:13 PM, Don Seiler  wrote:

>
> I had considered this as well, as this would allow me to rollback the
> delete (assuming my intel on postgres_fdw transactions was correct, which
> it may not be after all). I wondered if a remote insert would be broken up
> into individual inserts like the remote delete was, as that would be
> equally unappealing for the same reasons. But obviously worth confirming.
>

I have now confirmed that a remote INSERT does get broken up into a single
INSERT for each row.

-- 
Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

2018-04-11 Thread Don Seiler

On Wed, Apr 11, 2018 at 1:33 PM, Adrian Klaver 
wrote:
>
> A test case here confirms it sends individual INSERTS:
>
> test_(postgres)# insert into fdw_test_table select * from fdw_test;
> INSERT 0 3
>
> Where fdw_test_table is the remote table and fdw_test is the local one.
>
> postgres-2018-04-11 11:29:23.812 PDT-0LOG:  statement: insert into
> fdw_test_table select * from fdw_test;
> postgres-2018-04-11 11:29:23.812 PDT-0LOG:  execute pgsql_fdw_prep_2:
> INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
> postgres-2018-04-11 11:29:23.812 PDT-0DETAIL:  parameters: $1 = '1', $2 =
> 'one', $3 = 't'
> postgres-2018-04-11 11:29:23.813 PDT-10140LOG:  execute pgsql_fdw_prep_2:
> INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
> postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL:  parameters: $1 = '2',
> $2 = 'two', $3 = 'f'
> postgres-2018-04-11 11:29:23.813 PDT-10140LOG:  execute pgsql_fdw_prep_2:
> INSERT INTO public.fdw_test_rcv(id, fld_1, fld_2) VALUES ($1, $2, $3)
> postgres-2018-04-11 11:29:23.813 PDT-10140DETAIL:  parameters: $1 = '3',
> $2 = 'three', $3 = 'f'
>
> So much for that idea(:=


Yeah, I saw the same with a 132 row insert. Now imagine that with a monthly
50 million row insert or delete. :p Thanks for the confirmation!

I'm definitely leaning towards the copy/load/delete method.

Don.

-- 
Don Seiler
www.seiler.us

Re: Archiving Data to Another DB?

2018-04-11 Thread Don Seiler

On Wed, Apr 11, 2018 at 1:38 PM, Don Seiler  wrote:

>
> Yeah, I saw the same with a 132 row insert. Now imagine that with a
> monthly 50 million row insert or delete. :p Thanks for the confirmation!
>

I went back to look at the postgres logs on my dev server. These logs are
rotated once they hit 2G in size. One typical log from the middle of my
test last night hit that in 13 minutes and had over 5.2 million DELETE
calls. There a quite a few logs like this. That would not be fun for the
disk space on the log volume, either.

Don.
-- 
Don Seiler
www.seiler.us

Re: New website

2018-04-18 Thread Don Seiler

On Wed, Apr 18, 2018 at 5:03 PM, Adrian Klaver 
wrote:

> I would contact the Webmaster but Contact goes to a big image of an
> elephant head. That is also where Downloads, Support and Donate lands.
> Might have been a good idea to roll out a demo site for testing first. Will
> reserve judgment on the site design until it is functioning.

Those links work fine for me on the new site. I haven't hit any issues yet
in my casual click-testing.

-- 
Don Seiler
www.seiler.us

Re: initdb failing (10.4 centos7)

2018-05-19 Thread Don Seiler

On Sat, May 19, 2018 at 2:10 PM, Michael Nolan  wrote:

> initdb is failing on Centos 7 with 10.4 because the install procedure does
> not change the ownership of the /usr/local/pgsql directory to postgres.
>
> Changing the ownership fixes the problem, but the install procedure should
> do this.
>

Are you using the PGDG repo packages, or the default CentOS repo packages?
You should use PGDG and those should install under /var/lib/pgsql.

Don.
-- 
Don Seiler
www.seiler.us

Re: Bad performance with cascaded deletes

2018-06-12 Thread Don Seiler

On Tue, Jun 12, 2018 at 2:26 AM, Haug Bürger 
wrote:

>
> "availability_cache_ac_session_id_fkey" FOREIGN KEY (ac_session_id)
> REFERENCES zpg_data.session(id) ON DELETE CASCADE
>
>
Do you have an index on availability_cache.ac_session_id? These fields are
not automatically indexed and that can lead to horrible performance on
cascading operations like this.

-- 
Don Seiler
www.seiler.us

Re: Bad performance with cascaded deletes

2018-06-12 Thread Don Seiler

On Tue, Jun 12, 2018 at 10:48 AM, Don Seiler  wrote:

> On Tue, Jun 12, 2018 at 2:26 AM, Haug Bürger 
> wrote:
>
>>
>> "availability_cache_ac_session_id_fkey" FOREIGN KEY (ac_session_id)
>> REFERENCES zpg_data.session(id) ON DELETE CASCADE
>>
>>
> Do you have an index on availability_cache.ac_session_id? These fields
> are not automatically indexed and that can lead to horrible performance on
> cascading operations like this.
>

I'm blind apparently, it's your PK.

-- 
Don Seiler
www.seiler.us

Re: Issues on 9.5 to 9.6 Upgrade.

2018-06-13 Thread Don Seiler

On Wed, Jun 13, 2018 at 11:16 AM,  wrote:
>
>
> Can anyoen explain WHY there is such a big difference? Is it the SQL
> statement or a bug in the pgdump ?
>
>
Did you analyze the database after upgrading? Or at least the tables in
question? Those are very different plans.

-- 
Don Seiler
www.seiler.us

Vacuum process waiting on BufferPin

2018-08-13 Thread Don Seiler

Postgres 9.6.6 on CentOS 7

This afternoon I discovered an autovacuum process that had been running for
over 6 days. It was waiting on BufferPin event. I kicked off a manual
vacuum+analyze of the table, which automatically killed that autovacuum.
This ran for a few minutes before it too was waiting on a BufferPin event.
I've never witnessed a vacuum session waiting on BufferPin before.

In pg_locks, I see a handful of sessions that have an AccessShareLock on
the table I'm trying to vacuum. My vacuum session has a
ShareUpdateExclusiveLock on that relation. All of those sessions look like
orphaned reporting sessions sitting "idle in transaction". It's unclear to
me why a report job would end up idle in transaction, to my knowledge we
don't disable autocommit from the reporting app server.

Anyway, my next step is getting the OK to terminate those idle in
transaction sessions to see if that gets my vacuum job moving. Meanwhile
I'll ask a dev to sort out why they might be sitting idle in transaction,
there's no reason for them to be unless the app server connection died and
they are zombies. However I'm curious if there is someplace else I
could/should also look to get to the root cause of this.

Don.

-- 
Don Seiler
www.seiler.us

Re: Vacuum process waiting on BufferPin

2018-08-13 Thread Don Seiler

On Mon, Aug 13, 2018 at 4:15 PM, Alvaro Herrera 
wrote:

>
> Maybe you had a cursor that was not fully scanned before the session was
> left idle -- as I recall, those can leave buffers pinned.
>

I don't quite follow this. What circumstances would lead to this situation?

For what its worth, these sessions are backend reporting jobs, not user
interfacing at all.

Don.

-- 
Don Seiler
www.seiler.us

Re: Vacuum process waiting on BufferPin

2018-08-13 Thread Don Seiler

On Mon, Aug 13, 2018 at 4:04 PM, Don Seiler  wrote:
>
>
> Anyway, my next step is getting the OK to terminate those idle in
> transaction sessions to see if that gets my vacuum job moving. Meanwhile
> I'll ask a dev to sort out why they might be sitting idle in transaction,
> there's no reason for them to be unless the app server connection died and
> they are zombies. However I'm curious if there is someplace else I
> could/should also look to get to the root cause of this.
>

FYI, killing those idle-in-transaction sessions did free up the vacuum job,
which then finished almost immediately afterward.

Don.

-- 
Don Seiler
www.seiler.us

Re: Vacuum process waiting on BufferPin

2018-08-14 Thread Don Seiler

On Mon, Aug 13, 2018 at 5:55 PM, Vick Khera  wrote:

> On Mon, Aug 13, 2018 at 5:19 PM, Don Seiler  wrote:
>
>>
>> I don't quite follow this. What circumstances would lead to this
>> situation?
>>
>
> BEGIN WORK;
> DECLARE CURSOR ... ;
> FETCH ...;  -- for some number of fetches, which does not reach the end of
> the cursor.
>
> then just sit there idle, without having closed the cursor or fetching
> anything more.
>

So the fix in that case would be to ensure that they CLOSE the cursors when
done with them?

Don.

-- 
Don Seiler
www.seiler.us

Re: Vacuum process waiting on BufferPin

2018-08-14 Thread Don Seiler

On Tue, Aug 14, 2018 at 8:23 AM, Vick Khera  wrote:
>
>
> The general fix is to never sit idle in transaction, but this specific
> case closing the cursor seems like it will also do it.
>

Of course. I've let development know that they need to sort out why it's
left that way. But if they're also NOT closing cursors, that seems like
another bad practice to correct.

Would commit/rollback automatically close cursors opened in that
transaction?

Don.

-- 
Don Seiler
www.seiler.us

Re: Vacuum process waiting on BufferPin

2018-08-14 Thread Don Seiler

On Tue, Aug 14, 2018 at 8:58 AM, Vik Fearing 
wrote:

>
> There is an idle_in_transaction_session_timeout parameter to kill
> connections that are idle (in transaction) for too long.  It was
> implemented specifically for cases like this.

Thanks for reminding me of this. I'll probably look to set it but make sure
to point out that it is just a safety net to let DB maintenance run and
they should make sure their work is committed cleanly if they want to keep
it.

Don.

-- 
Don Seiler
www.seiler.us

Odd Row Estimates in Query Plan (rows=75)

2018-08-15 Thread Don Seiler

PostgreSQL 9.6.6 on CentOS.

We have a report query that has gone from maybe a few seconds to run to a
few minutes to run since mid-July. Looking at the output of EXPLAIN
ANALYZE, the row count estimates are way off, even though this table was
just analyzed a day or so ago. What's more bizarre to me is that the row
count esimate is *always* 75 for every node of the plan, where the actual
rows is in the hundreds or thousands. This table is one of the busiest
tables in our production database (many inserts and updates). It is
autovacuumed and autoanalyzed a few times per week, although I'm looking to
change it to a nightly manual schedule to avoid daytime autovacuums.

 Hash Join  (cost=1869142.34..1869146.15 rows=75 width=88) (actual
time=179877.869..179878.011 rows=759 loops=1)
   Hash Cond: (stores.pkey = lt.store_pkey)
   Buffers: shared hit=1654593 read=331897 dirtied=249
   ->  Seq Scan on stores  (cost=0.00..2.77 rows=77 width=22) (actual
time=0.007..0.023 rows=78 loops=1)
 Buffers: shared hit=2
   ->  Hash  (cost=1869141.40..1869141.40 rows=75 width=50) (actual
time=179877.847..179877.847 rows=759 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 73kB
 Buffers: shared hit=1654591 read=331897 dirtied=249
 ->  Subquery Scan on lt  (cost=1869138.59..1869141.40 rows=75
width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
   Buffers: shared hit=1654591 read=331897 dirtied=249
   ->  GroupAggregate  (cost=1869138.59..1869140.65 rows=75
width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
 Group Key: lts.store_pkey, lts.owner,
(date_trunc('minute'::text, lts.date_gifted))
 Filter: (count(*) IS NOT NULL)
 Buffers: shared hit=1654591 read=331897 dirtied=249
 ->  Sort  (cost=1869138.59..1869138.78 rows=75
width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
   Sort Key: lts.store_pkey,
lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
   Sort Method: quicksort  Memory: 757kB
   Buffers: shared hit=1654591 read=331897
dirtied=249
   ->  Index Scan using gifts_date_added on gifts
lts  (cost=0.56..1869136.25 rows=75 width=42) (actual
time=190.657..179870.165 rows=6731 loops=1)
 Index Cond: ((date_added > '2018-07-14
11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
14:14:21'::timestamp without time zone))
 Filter: ((date_gifted >= '2018-08-13
11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
14:14:21'::timestamp without time zone))
 Rows Removed by Filter: 938197
 Buffers: shared hit=1654591 read=331897
dirtied=249
 Planning time: 0.426 ms
 Execution time: 179893.894 ms

I don't have a version of this query from prior to this summer, but getting
explain plan for older data from older sandboxes show a similar plan.

Sidenote: I am suggesting that an index be added on the date_gifted field
as that is far more selective and avoids throwing rows away. However I'm
very interested in why every node dealing with the gifts table thinks
rows=75 when the actual is much, much higher. And 75 seems like too round
of a number to be random?

-- 
Don Seiler
www.seiler.us

Re: Odd Row Estimates in Query Plan (rows=75)

2018-08-15 Thread Don Seiler

Here's the query, obfuscated manually by me:

SELECT
'Foo' as system_function,
stores.name as store,
lt.owner,
lt.minute_of_day,
lt.records
FROM
foo.stores
LEFT OUTER JOIN
(SELECT
lts.store_pkey,
lts.owner,
date_trunc('minute', lts.date_gifted) as minute_of_day,
count(*) as records
FROM foo.gifts lts
WHERE
lts.date_added  > '2017-07-14 11:13:05'
AND lts.date_added  < '2017-08-13 14:14:21'
AND lts.date_gifted >= '2017-08-13 11:13:05'
AND lts.date_gifted <  '2017-08-13 14:14:21'
GROUP BY 1,2,3
ORDER BY 1
) lt ON lt.store_pkey = stores.pkey
WHERE lt.records IS NOT NULL;

The foo.gifts table is pretty much the core table of our database. It's big
and very active. There is an index on date_added but not yet on
date_gifted.

I'm working to re-write the query while the dev sees if we even need this
query anymore.

On Wed, Aug 15, 2018 at 2:39 PM, Adrian Klaver 
wrote:

> On 08/15/2018 12:31 PM, Don Seiler wrote:
>
>> PostgreSQL 9.6.6 on CentOS.
>>
>> We have a report query that has gone from maybe a few seconds to run to a
>> few minutes to run since mid-July. Looking at the output of EXPLAIN
>> ANALYZE, the row count estimates are way off, even though this table was
>> just analyzed a day or so ago. What's more bizarre to me is that the row
>> count esimate is *always* 75 for every node of the plan, where the actual
>> rows is in the hundreds or thousands. This table is one of the busiest
>> tables in our production database (many inserts and updates). It is
>> autovacuumed and autoanalyzed a few times per week, although I'm looking to
>> change it to a nightly manual schedule to avoid daytime autovacuums.
>>
>> Hash Join  (cost=1869142.34..1869146.15 rows=75 width=88) (actual
>> time=179877.869..179878.011 rows=759 loops=1)
>> Hash Cond: (stores.pkey = lt.store_pkey)
>> Buffers: shared hit=1654593 read=331897 dirtied=249
>> ->  Seq Scan on stores  (cost=0.00..2.77 rows=77 width=22) (actual
>> time=0.007..0.023 rows=78 loops=1)
>>   Buffers: shared hit=2
>> ->  Hash  (cost=1869141.40..1869141.40 rows=75 width=50) (actual
>> time=179877.847..179877.847 rows=759 loops=1)
>>   Buckets: 1024  Batches: 1  Memory Usage: 73kB
>>   Buffers: shared hit=1654591 read=331897 dirtied=249
>>   ->  Subquery Scan on lt  (cost=1869138.59..1869141.40 rows=75
>> width=50) (actual time=179875.976..179877.697 rows=759 loops=1)
>> Buffers: shared hit=1654591 read=331897 dirtied=249
>> ->  GroupAggregate  (cost=1869138.59..1869140.65 rows=75
>> width=50) (actual time=179875.976..179877.606 rows=759 loops=1)
>>   Group Key: lts.store_pkey, lts.owner,
>> (date_trunc('minute'::text, lts.date_gifted))
>>   Filter: (count(*) IS NOT NULL)
>>   Buffers: shared hit=1654591 read=331897 dirtied=249
>>   ->  Sort  (cost=1869138.59..1869138.78 rows=75
>> width=42) (actual time=179875.961..179876.470 rows=6731 loops=1)
>> Sort Key: lts.store_pkey,
>> lts.entry_source_owner, (date_trunc('minute'::text, lts.date_gifted))
>> Sort Method: quicksort  Memory: 757kB
>> Buffers: shared hit=1654591 read=331897
>> dirtied=249
>> ->  Index Scan using gifts_date_added on
>> gifts lts  (cost=0.56..1869136.25 rows=75 width=42) (actual
>> time=190.657..179870.165 rows=6731 loops=1)
>>   Index Cond: ((date_added > '2018-07-14
>> 11:13:05'::timestamp without time zone) AND (date_added < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>>   Filter: ((date_gifted >= '2018-08-13
>> 11:13:05'::timestamp without time zone) AND (date_gifted < '2018-08-13
>> 14:14:21'::timestamp without time zone))
>>   Rows Removed by Filter: 938197
>>   Buffers: shared hit=1654591 read=331897
>> dirtied=249
>>   Planning time: 0.426 ms
>>   Execution time: 179893.894 ms
>>
>> I don't have a version of this query from prior to this summer, but
>> getting explain plan for older data from older sand

Re: Odd Row Estimates in Query Plan (rows=75)

2018-08-15 Thread Don Seiler

On Wed, Aug 15, 2018 at 3:31 PM, Adrian Klaver 
wrote:

>
> lts.date_added  > '2017-07-14 11:13:05'
>
> and
>
> lts.date_gifted >= '2017-08-13 11:13:05'
> ?
>
> In other words one '>' and the other '>=' ?
>

The date_added filters were added just to use that index and with a broad
range, since there isn't a filter on date_gifted. You'll notice the
date_added range is 30 days but the date_gifted range is 3 hours. We really
only care about date_gifted but at this time there isn't an index on that
field.

Even as I experiment with some query rewrites, the EXPLAIN ANALYZE always
says rows=75. I'm *very* curious to see why it is using that value.

Don.

-- 
Don Seiler
www.seiler.us

Re: Replication question

2018-10-22 Thread Don Seiler

>
> I thought I read somewhere that in 9.6, as long as the WAL log is
> available on disk or in the archive the replication server will provide
> that to the replication client, and my archive NFS mount didn’t have to be
> available to all replication clients.
>

Streaming replication will only read from the WAL files in the
$PGDATA/pg_xlog directory. It will not read from archives. So, yes, you
would need your NFS mount on the replica (or otherwise copy the archive
files to the replica).

Don.

-- 
Don Seiler
www.seiler.us

Not Null Constraint vs Query Planning

2020-03-02 Thread Don Seiler

If I have a NOT NULL constraint on a column, and then run a query where
that column IS NULL, does the optimizer "short-circuit" the query to return
0 rows right away?

If so, is there a way to see that it is doing so? I've been running a few
explain plans this morning and they all look the same.

Here I create a table in PG10 with 10 million dummy rows, no indexes,
vacuum/analyzed. I then query before and after and don't notice much
difference. Wondering if there's any way to see an indication that the
constraint was used in the query planning. My sample runs are at the end of
this email.

The REAL reason for this is that I'm wondering if I created a NOT NULL
check constraint with "NOT VALID" would that then NOT be considered in such
a "short-circuit" case until I ran the VALIDATE CONSTRAINT on it? Perhaps I
should have just asked this in its own thread but I started diving into the
query plan thing.

postgres=# explain (analyze, buffers) select name from people where
created_at is null;
 QUERY PLAN

 Seq Scan on people  (cost=0.00..203093.21 rows=1 width=33) (actual
time=5365.886..5365.886 rows=0 loops=1)
   Filter: (created_at IS NULL)
   Rows Removed by Filter: 1000
   Buffers: shared hit=12828 read=90265
 Planning time: 2.558 ms
 Execution time: 5379.862 ms
(6 rows)

postgres=# alter table people alter column created_at set not null;
ALTER TABLE
postgres=# vacuum analyze people;
VACUUM
postgres=# explain (analyze, buffers) select name from people where
created_at is null;
 QUERY PLAN

 Seq Scan on people  (cost=0.00..203092.49 rows=1 width=33) (actual
time=2339.254..2339.254 rows=0 loops=1)
   Filter: (created_at IS NULL)
   Rows Removed by Filter: 1000
   Buffers: shared hit=12938 read=90155
 Planning time: 0.390 ms
 Execution time: 2339.274 ms
(6 rows)

-- 
Don Seiler
www.seiler.us

Re: Not Null Constraint vs Query Planning

2020-03-02 Thread Don Seiler

On Mon, Mar 2, 2020, 12:30 Vik Fearing  wrote:

> On 02/03/2020 18:09, Don Seiler wrote:
> > The REAL reason for this is that I'm wondering if I created a NOT NULL
> > check constraint with "NOT VALID" would that then NOT be considered in
> such
> > a "short-circuit" case until I ran the VALIDATE CONSTRAINT on it?
> Perhaps I
> > should have just asked this in its own thread but I started diving into
> the
> > query plan thing.
>
> You cannot do this because NOT NULL isn't a real constraint (meaning it
> does not appear in pg_constraint).  There have been several attempts to
> make it a real constraint over the years but so far nothing has come of
> them


Using the check constraint method seemed to allow for the "not valid" step.
I'm curious what the difference is between a NOT NULL check constraint
versus setting the column to NOT NULL (assuming both are validated).

Don.

Mixed Locales and Upgrading

2020-03-16 Thread Don Seiler

Good morning,

I have a few clusters that need to be upgraded from PG 9.6 or 10 to 12.
Normally I'd just plan to pg_upgrade the lot and be good to go. However
I've found that quite a few (including our biggest/busiest database) have
mixed locales. In the case of the biggest/busiest database, the cluster was
created with locale en_US (NOT en_US.UTF-8), and so the databases have
encoding LATIN1.

However this database has encoding UTF8 while still having ctype and
collation of en_US. I've since found that when this was last upgraded, they
ran "update pg_database set encoding = pg_char_to_encoding('UTF8') where
datname = 'test';" to change the encoding. In my testing, pg_upgrade breaks
when trying to restore this since UTF8 isn't supported in en_US for the CREATE
DATABASE command used during pg_restore:

command: "/usr/lib/postgresql/12/bin/pg_restore" --host /var/lib/postgresql
--port 50432 --username postgres --create --exit-on-error --verbose
--dbname template1 "pg_upgrade_dump_16385.custom" >>
"pg_upgrade_dump_16385.log" 2>&1
pg_restore: connecting to database for restore
pg_restore: creating DATABASE "test"
pg_restore: while PROCESSING TOC:
pg_restore: from TOC entry 2164; 1262 16385 DATABASE test postgres
pg_restore: error: could not execute query: ERROR:  encoding "UTF8" does
not match locale "en_US"
DETAIL:  The chosen LC_CTYPE setting requires encoding "LATIN1".
Command was: CREATE DATABASE "test" WITH TEMPLATE = template0 ENCODING =
'UTF8' LC_COLLATE = 'en_US' LC_CTYPE = 'en_US';

Is there a way around this while still using pg_upgrade? My understanding
is that a full dump and restore into a new DB with everything set to
en_US.UTF-8 is the only to convert these (if I have to dump/restore, I
wouldn't want to keep the mixed environment). Even with parallel jobs, I
imagine that's a bit of downtime but I'll have to wait until I can get a
copy of prod data to test with to be sure.

Is logical replication an option here? Either maintaining the mixed
environment or converting everything to en_US.UTF-8? I'm relatively new in
this shop but I'm told they didn't mean to use en_US and there's no reason
they wouldn't want to just use the standard/default UTF-8.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-17 Thread Don Seiler

On Mon, Mar 16, 2020 at 10:28 AM Tom Lane  wrote:

>
> Egad.
>

My thoughts exactly.

> Well, in principle you could likewise manually update pg_database's
> datcollate and datctype columns to say "en_US.utf8".  However, there's
> a much bigger problem here --- what steps if any did this cowboy take
> to ensure that the data inside the database was valid UTF8?
>

No steps that I've seen from the chat history I've been able to search. I'm
not sure if there was an (invalid) assumption that LATIN1 is a subset of
UTF-8 or if it was done in a panic to get the import/update working years
ago.

> I don't think you should use pg_upgrade here at all.  A dump/restore
> is really the only way to make sure that you have validly encoded data.
>

That is what I thought, and probably not what they'll want to hear given
the downtime involved. Even with parallel dump/restore jobs, I imagine it
will take quite a while (this first DB is almost 900GB).

> However, if it's only one database out of a bunch, you could do something
> like
>
> * pg_dump that one database;
> * drop said database;
> * pg_upgrade everything else;
> * restore that one database from dump.
>

In the case of this busy cluster, the layout is like this:

postgres=# \l
 List of databases
   Name|  Owner   | Encoding |  Collate   |   Ctype|   Access
privileges
---+--+--+++---
 f_all | postgres | UTF8 | en_US  | en_US  |
 postgres  | postgres | LATIN1   | en_US  | en_US  | =Tc/postgres
  +
   |  |  |||
postgres=CTc/postgres
 template0 | postgres | LATIN1   | en_US  | en_US  | =c/postgres
   +
   |  |  |||
postgres=CTc/postgres
 template1 | postgres | UTF8 | en_US.utf8 | en_US.utf8 |
(4 rows)

So the template1 DB was dropped and recreated with the collate and ctype as
well and then set to be a template again. But I believe that was well after
f_all was changed (and so probably no need for the template1 change).
In this case, if this is the only DB in the cluster, would it make sense to
just create a new one as en_US.UTF-8 and then restore a dump of f_all
into a pre-created en_US.UTF-8 DB?

We have a few other mixed environments similar to this as well. Some have
postgres and both template DBs with this same UTF8/en_US/en_US
configuration.

Is logical replication an option here? If the target DB were setup as
en_US.UTF-8 across the board, would logical replication safely replicate
and convert the data until we could then cut over?

Thanks,
Don.
-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-17 Thread Don Seiler

On Tue, Mar 17, 2020 at 8:06 AM Don Seiler  wrote:

> On Mon, Mar 16, 2020 at 10:28 AM Tom Lane  wrote:
>
>>
> Well, in principle you could likewise manually update pg_database's
>> datcollate and datctype columns to say "en_US.utf8".  However, there's
>> a much bigger problem here --- what steps if any did this cowboy take
>> to ensure that the data inside the database was valid UTF8?
>
>
Is there a way to programmatically check for data that might be a problem
now?

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-17 Thread Don Seiler

On Tue, Mar 17, 2020 at 8:56 AM Tom Lane  wrote:

>
> Yikes.  Well, if there aren't obvious operational problems, it might be
> that the data is actually UTF8-clean, or almost entirely so.  Maybe you
> could look at the problem as being one of validation.  In that case,
> it'd be possible to consider not taking the production DB down, but just
> doing a pg_dump from it and seeing if you can restore somewhere else.
> If not, fix the broken data; repeat till clean.  After that you could
> do pg_upgrade with a clear conscience.  I think you'll still end up
> manually fixing the inconsistent datcollate/datctype settings though.
>

For this test, would we restore into an en_US.UTF-8/UTF8 database? Then,
assuming no errors (or fixing any errors until clean), we change the
datcollate/datctype settings in prod and proceed with pg_upgrade (obviously
after testing all of that heavily)?

What are the ramifications of changing collation like that? Should we
consider rebuilding indexes ASAP after that?

Don.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-22 Thread Don Seiler

On Tue, Mar 17, 2020 at 9:25 PM Michael Paquier  wrote:

>
> There is no way to know how much indexes would get broken without
> having a look at it.  Anything ASCII-based should be of no problem.
> If you have a doubt, reindexing evey index which includes text column
> data is the best course of action in my opinion if you have any
> doubts, because that's safe even if it has a higher cost.
>

Here's the fun part. A lot of the tables use UUIDv4 strings for primary
keys. However these are stored in text/varchar columns.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-30 Thread Don Seiler

On Sun, Mar 22, 2020 at 4:48 PM Don Seiler  wrote:

>
> Here's the fun part. A lot of the tables use UUIDv4 strings for primary
> keys. However these are stored in text/varchar columns.
>

Actually, would I need to re-index on text columns that we know contain
UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII
characters.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-03-30 Thread Don Seiler

On Mon, Mar 30, 2020 at 4:30 PM Tom Lane  wrote:

> Don Seiler  writes:
> > Actually, would I need to re-index on text columns that we know contain
> > UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII
> > characters.
>
> I think you're all right with respect to those, since they're the
> same under any encoding.  It's columns containing non-ASCII characters
> that you'd want to worry about reindexing.
>

That's what I was hoping to hear. Thanks!

Don.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-04-07 Thread Don Seiler

On Mon, Mar 30, 2020 at 4:39 PM Don Seiler  wrote:

> On Mon, Mar 30, 2020 at 4:30 PM Tom Lane  wrote:
>
>> Don Seiler  writes:
>> > Actually, would I need to re-index on text columns that we know contain
>> > UUID strings? UUID characters seem to be pretty basic alphanumeric ASCII
>> > characters.
>>
>> I think you're all right with respect to those, since they're the
>> same under any encoding.  It's columns containing non-ASCII characters
>> that you'd want to worry about reindexing.
>
>
Follow-up question, the locale setting on the host would still be set to
en_US (as would the postgres and template0 databases). Should I look to
change that locale on the system to en_US.UTF-8, or even just for the
postgres user that the DB cluster runs as? What are the ramification for
doing (or not doing) so?

Don.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-04-29 Thread Don Seiler

On Tue, Apr 7, 2020 at 11:41 AM Don Seiler  wrote:

>
> Follow-up question, the locale setting on the host would still be set to
> en_US (as would the postgres and template0 databases). Should I look to
> change that locale on the system to en_US.UTF-8, or even just for the
> postgres user that the DB cluster runs as? What are the ramification for
> doing (or not doing) so?
>

One more question around the GUC settings for locale. It is currently set
to this:

# select name,setting from pg_settings where name like 'lc%';
name | setting
-+-
 lc_collate  | en_US
 lc_ctype| en_US
 lc_messages |
 lc_monetary | C
 lc_numeric  | C
 lc_time | C

Since I'm not changing the postgres or template0 databases (leaving those
as en_US/LATIN1), do I keep lc_collate/lc_ctype as en_US? It's just the
template1 and application database that I've set to en_US.UTF-8.

I'm also struggling to see how lc_messages is an empty string. It is
commented out in postgresql.conf but suggests 'C' will be the default. The
OS locale LC_MESSAGES is set to en_US on the primary but I also see it is
set to en_US on the newer replica hosts. What value would be used for
lc_messages? I'm trying to create an empty DB with these same settings but
if I omit --lc-messages it uses the OS locale value, and I can't set it to
an empty string.

Don.

-- 
Don Seiler
www.seiler.us

template0 needing vacuum freeze?

2020-05-16 Thread Don Seiler

PG 9.6 (9.6.8). Got an alert today when template0 and template1 were both
over 50% towards TXID wraparound. I could vacuum template1 but couldn't
vacuum template0 without first allowing connections. This is what it looked
like before:

# SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC;
  datname   |age | current_setting
++-
 foo_db | 1022106099 | 2
 template0  | 1000278345 | 2
 postgres   | 643729 | 2
 template1  | 643729 | 2
(4 rows)

I've since allowed connections and ran "vacuumdb --freeze" on it and then
immediately disabled the connections to it again. But I'm curious how
template0 would be growing in age like this. Even now I see the template0
age growing. I can say that these DB has previously been altered for locale
changes as well.

I'm also running a long "vacuum freeze" on foo_db that will take a few days
after seeing that autovacuum on a big table had been running on it since
Feb 2 and making no progress, with over 850M dead tuples according to
pg_stat_all_tables. I estimate 3-4 more days to go on that one. Once that's
done I'll be scheduling manual vacuum jobs. Just wondering if that would
somehow affect regular template0 cleanup though.

I don't see anything in postgres log related to template0 other than my
manual interactions today.

-- 
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

2020-05-16 Thread Don Seiler

On Sat, May 16, 2020 at 12:44 PM Tom Lane  wrote:

>
> So it's unsurprising that the freeze age increases until autovacuum
> decides to do something about it.  I'm suspicious that your alert settings

are too aggressive and are notifying you before autovacuum kicks in.
> You should *not* have had to do anything manual about this, unless you
> have frobbed your autovac settings to the point of brokenness.
>

Shouldn't autovacuum have kicked in when the age of a table reaches 200M
(our autovacuum_freeze_max_age is left at that default)? I see other tables
in our app DB triggering the autovacuum "to prevent wrap-around" when they
reach 200M. That's what had me concerned to see template0 with an age over
1B and no autovacuum even trying to clean up for it.

Don.

-- 
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

2020-05-18 Thread Don Seiler

On Mon, May 18, 2020 at 1:40 AM Laurenz Albe 
wrote:

>
> Did you see any weird messages when you vacuumed "template0"?
>

No.

> Did "datfrozenxid" shrink after the operation?
>
> "foo_db" seems to be the bigger problem.

Perhaps autovacuum never handled "template0" because it concluded (rightly)
> that
> it has to deal with "foo_db" first.
>

Yes this DB had a table in it that had been autovacuuming since Feb 2. It's
age is half way to wraparound so I'm in the middle of a manual VACUUM
FREEZE on it. I'd be interested in knowing if that prevents template0 from
autovacuuming itself. There are no other autovacuum jobs running.

>
> > I can say that these DB
> > has previously been altered for locale changes as well.
>
> Would you care to explain that?  You changed "template0"?  How?
>

It was changed before my time here to change the encoding from LATIN1 to
UTF manually. One of the fun treats I'm working to correct as I also
prepare these for upgrade to PG12.

What are your non-default autovacuum settings?  Perhaps you should speed up
> autovacuum
> by reducing "autovacuum_vacuum_cost_delay" to 2ms or less, and by
> increasing
> "maintenance_work_mem".
>

All autovacuum settings on this DB are default. Cost delay is at the
default 20ms. maintenance_work_mem I've already increased to 512MB (this VM
has 8GB RAM).

-- 
Don Seiler
www.seiler.us

Re: template0 needing vacuum freeze?

2020-05-18 Thread Don Seiler

On Mon, May 18, 2020 at 8:51 PM Tom Lane  wrote:

>
> Do you have an idea why autovac was failing to clear the issue on that one
> problem table, though?
>

Before I intervened, the maintenance_work_mem was only 16MB (they had
encoded and carried over PG 8 defaults in the chef recipe). I bumped it to
512MB before kicking off my freezeThat's the big factor I can think of.
This is a huge table obviously as well. By the time it caught my attention
there were 850M dead tuples to be cleaned up. My VACUUM FREEZE has been
running just just about 5 days and is probably half way done. We shouldn't
be at risk of hitting wraparound though (only 52% there).

Don.

-- 
Don Seiler
www.seiler.us

Re: Mixed Locales and Upgrading

2020-06-15 Thread Don Seiler

Good morning,

Back with a follow-up question to all this. I'm wondering if we shouldn't
also change the locale settings for postgres/template0/template1 to match
our new desires setting en_US.UTF-8 with UTF8 encoding. We haven't written
anything to postgres. Some of our DB clusters have changed these already
(but still leaving things mixed) but I want to make things uniform for
future upgrades to go a lot more smoothly.

Some examples of current DB clusters that need to be fixed. The first one
has the more problematic incompatible mixing within the DBs (the original
point of this thread), the others just have different settings across DBs
that I'd like change going forward as well.

  datname   | encoding | datcollate | datctype | size_mb
+--++--+-
 xx | UTF8 | en_US  | en_US| 1390789
 postgres   | UTF8 | en_US  | en_US|   6
 template0  | UTF8 | en_US  | en_US|   6
 template1  | UTF8 | en_US  | en_US|   6

   datname| encoding | datcollate |  datctype  | size_mb
--+--+++-
 xxx  | UTF8 | en_US.utf8 | en_US.utf8 |2178
 postgres | LATIN1   | en_US  | en_US  |   7
 template0| LATIN1   | en_US  | en_US  |   7
 template1| UTF8 | en_US.utf8 | en_US.utf8 |   7

  datname  | encoding | datcollate |  datctype  | size_mb
---+--+++-
 xx| UTF8 | en_US.utf8 | en_US.utf8 | 345
 postgres  | LATIN1   | en_US  | en_US  |   7
 template0 | UTF8 | en_US.utf8 | en_US.utf8 |   7
 template1 | UTF8 | en_US.utf8 | en_US.utf8 |   7

For the smaller DBs I was planning to just dump/restore into a new cluster
as I upgrade to PG12 as well. However two that have the problem are the two
biggest ones where the downtime for dump/restore would be too great. So I'm
wondering if there is risk or harm in running an UPDATE pg_database command
on postgres/template0/template1 as needed and re-indexing afterward.

-- 
Don Seiler
www.seiler.us

autovacuum on primary blocking queries on replica?

2022-05-27 Thread Don Seiler

   - PostgreSQL 12.9 - PGDG Ubuntu 18.04 image
   - Streaming physical replication
   - hot_standby_feedback = on

We use a read replica to offload a lot of (what should be) quick queries.
This morning we had an incident where these queries were all blocking on
AccessShareLock waits, written to the log as:

2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp LOG:  process 8185
still waiting for AccessShareLock on relation 16834 of database 16401 after
1000.228 ms at character 204
2022-05-27 15:23:53.476 UTC [8185] foo@foo_all - myapp DETAIL:  Process
holding the lock: 10822. Wait queue: 32373, 8185, 13782, 14290, 13215,
4427, 16056, 13446, 13448, 13445, 9268, 11784, 14469
, 14502, 14340, 6510, 8730, 10876, 13012, 15760, 6645, 14712, 13006, 15923,
14485, 15924, 13447, 9926, 13005, 11803, 13013, 13298, 16059, 9266, 4943,
14470, 13148, 12118, 14491, 9927, 11791, 15097,
 11804, 16060, 14711, 5469, 8113, 16061, 14698, 14475, 10620, 13150, 14501,
14465, 14341, 16062, 16065, 14484, 7060, 11785, 16068, 16063, 16064, 16066.

This went on for 30 seconds (the value of max_standby_streaming_delay)
until PG killed the blocking process:

2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp FATAL:
 terminating connection due to conflict with recovery
2022-05-27 15:24:22.474 UTC [10822] foo@foo_all - anotherapp DETAIL:  User
was holding a relation lock for too long.

I'm trying to find the root cause of why this started. We did see an UPDATE
that was executed at 2022-05-27 15:23:37.000 UTC on the table in question
(relation 16384) that ran against all rows of the table (only ~8200 rows,
not huge) but the transaction was rolled back (due to a syntax error later
in the transaction).

15 seconds later we then see an aggressive autovacuum on this table:

2022-05-27 15:23:52.507 UTC [30513] LOG:  automatic aggressive vacuum of
table "foo_all.public.industry": index scans: 1
pages: 252 removed, 323 remain, 0 skipped due to pins, 0 skipped
frozen
tuples: 8252 removed, 8252 remain, 0 are dead but not yet
removable, oldest xmin: 1670999292
buffer usage: 12219 hits, 137 misses, 54 dirtied
avg read rate: 2.372 MB/s, avg write rate: 0.935 MB/s
system usage: CPU: user: 0.14 s, system: 0.00 s, elapsed: 0.45 s

and less than a second after that is when we see the first AccessShareLock
message on the replica.

I've been reading tales of autovacuum taking an AccessExclusiveLock when
truncating empty pages at the end of a table. I'm imagining that updating
every row of a table and then rolling back would leave all of those rows
empty at the end and qualify for truncation and lead to the scenario I saw
this morning.

I'm still not entirely satisfied since that table in question was so small
(only 8252 rows) so I wouldn't imagine it would hold things up as long as
it did. Although the blocking session on the replica was an application
session, not any background/recovery process.

I'm wondering if I'm on the right trail or if there is a much firmer
explanation for what happened.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Re: Compile and build portable postgresql for mac

2018-10-25 Thread Don Seiler

On Wed, Oct 24, 2018 at 7:22 PM Pratik Parikh 
wrote:

>  How to compile and build portable postgresql for mac.  Can someone
> point me how the builds are done for general distribution.  A script would
> be appreciated if anyone has it.
>

I'm not sure what you mean by "portable"? I'm going to answer assuming you
just want to run postgres on your own Mac.

There are couple options. Peter Eisentraut has a great collection of
Homebrew scripts that you can plug into:
https://github.com/petere/homebrew-postgresql

I've recently switched to pgenv, which provides a clean CLI for building
and using all the various supported versions:
https://github.com/theory/pgenv

Don.
-- 
Don Seiler
www.seiler.us

Re: Postgres 11 chooses seq scan instead of index-only scan

2019-01-24 Thread Don Seiler

On Thu, Jan 24, 2019 at 9:01 AM twoflower  wrote:

> Yes! That was it, after running VACUUM TABLE history_translation, the
> query is now executed using index-only scan.
>
> I was under the impression that ANALYZE TABLE history_translation is
> enough, but it is not.
>

Only a VACUUM will update the visibility map.
https://www.postgresql.org/docs/current/storage-vm.html

I used to think the same, that ANALYZE was enough, coming from an Oracle
background. I learned later that the visibility map isn't just used to
determine what to vacuum, but it is used by the optimizer/planner when
evaluating execution plans.

-- 
Don Seiler
www.seiler.us

Native Logical Replication Initial Import Qs

2023-06-07 Thread Don Seiler

Good afternoon.

I'm looking at having to move a fleet of PG 12 databases from Ubuntu 18.04
to Ubuntu 22.04. This means crossing the dreaded libc collation change, so
we're looking to have to migrate via pg_dump/restore or logical replication
for the bigger/busier ones. We're also planning to use PG 15 on the
destination (Ubuntu 22.04) side to kill two birds with one stone, as much
as I'd prefer to have minimal moving parts.

On the logical replication front, the concern is with the initial data
import that happens when the subscription is created (by default). I know
that you can tell the subscription to not copy data and instead use pg_dump
and a replication slot snapshot to achieve this manually. However I'm
unable to explain (to myself) why this is better than just having the
subscription do it upon creation. Given that I can create pub/sub sets for
individual tables for parallel operations, I'm curious what advantages
there are in using pg_dump to do this import.

I had been planning to have pg_dump pipe directly into the destination
database via psql. Is this faster than just having the subscription do the
import? I'm curious as to why or not. I know to only use the minimal
indexes required on the destination side (ie identity-related indexes) and
omit other indexes and constraints until after the data is loaded, but that
is true for either method.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Re: Native Logical Replication Initial Import Qs

2023-06-22 Thread Don Seiler

On Wed, Jun 7, 2023 at 4:30 PM Jeremy Schneider 
wrote:

> On 6/7/23 2:12 PM, Don Seiler wrote:
> > On the logical replication front, the concern is with the initial data
> > import that happens when the subscription is created (by default). I
> > know that you can tell the subscription to not copy data and instead use
> > pg_dump and a replication slot snapshot to achieve this manually.
> > However I'm unable to explain (to myself) why this is better than just
> > having the subscription do it upon creation. Given that I can create
> > pub/sub sets for individual tables for parallel operations, I'm curious
> > what advantages there are in using pg_dump to do this import.
>
> FWIW, I think the place this feature shines the most is when you can
> safely leverage things like storage-level snapshots. Sometimes that
> means you can get a copy of a multi-TB database almost instantly if the
> storage or filesystem does copy-on-write, for example database lab
> (postgres.ai) which uses ZFS.
>
> Another thing I can think of is that while it's true you can create
> multiple pub/sub sets, I'm not sure you can reduce the number of sets
> later. So if you were concerned about having too many slots doing
> decoding on the source, then you might want the flexibility of pg_dump
> (or perhaps restoring a backup) to get more parallelism while having
> more control over how many slots will be used later.
>
> In your case, the whole setup is hopefully temporary, so maybe these
> particular concerns aren't as relevant to you.
>
> This is just what comes to mind... probably there's a few more things
> I'm not thinking of and hopefully others will chime in.  :)
>

Yes my setup would be temporary, just as long as needed to complete the
migration to a new host.

One other use case is using pg_dump/restore to move older data in
time-based partitions (assuming data isn't modified for older partition
periods). Those older partitions would just need to be copied but wouldn't
need any ongoing pub/sub/replication set up for them. So you'd have both in
use in that case.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Runaway Initial Table Syncs in Logical Replication?

2023-08-03 Thread Don Seiler

Logical Rep question. Publisher is PG12 on Ubuntu 18.04, subscriber is PG15
on Ubuntu 22.04.

I bumped up some values to see how initial load times change. I set
max_logical_replication_workers to 20 and max_sync_workers_per_subscription
to 4. I’m using 3 subscriptions, 2 of the subscriptions have 3 tables each,
the other has a lot more, say 100 for the sake of example.What we noticed
was that the subscriber basically maxed out the wal senders and replication
slots on the publisher side, even when the publisher settings were bumped
up to 50 each. 3 replication slots were for the 3 subs and the rest (47)
were sync workers. Was creating a sync worker for each table right away, or
at least trying to? The subscriber side was still complaining that it
couldn’t create more replication slots on the publisher.

I was expecting to see max_sync_workers_per_subscription (4) x # of subs
(3) = 12 sync workers in action so this was a big surprise. Is this
expected behavior?

-- 
Don Seiler
www.seiler.us

Re: Runaway Initial Table Syncs in Logical Replication?

2023-08-04 Thread Don Seiler

: session time:
0:00:00.009 user=migrator datab
ase=foo host=10.228.88.74 port=17390

and then this from a different pid, maybe some kind of cleanup process? It
was doing this for what seemed like all of the sync slots:

2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 ERROR:  replication
slot "pg_19742_sync_17238_7263122209699118815" does not exist
2023-08-03 18:47:47.975 UTC [3303] migrator@foo - sub01 STATEMENT:
 DROP_REPLICATION_SLOT pg_19742_sync_17238_7263122209699118815 WAIT

-- 
Don Seiler
www.seiler.us

Re: Calculating vm.nr_hugepages

2023-08-30 Thread Don Seiler

On Wed, Aug 30, 2023 at 8:12 AM Troels Arvin  wrote:

> Hello,
>
> I'm writing an Ansible play which is to set the correct value for
> vm.nr_hugepages on Linux servers where I hope to make Postgres make use
> of huge pages.
>
> However, I'm struggling to find the right formula.
>
> I assume I need to find the same value as I get from running "postgres
> -C shared_memory_size_in_huge_pages". I call that my target value.
> Note: I cannot simply run "postgres -C ...", because I need my Ansible
> play to work against a server where Postgres is running.
>
> I've tried using the formula described at
> https://www.cybertec-postgresql.com/en/huge-pages-postgresql/, but it
> produces a different value than my target:
>
> Using a shared_buffers value of 21965570048, like in Cybertec
> Postgresql's example:
> "postgres ... -C 21965570048B" yields: 10719
> The formula from Cybertec Postgresql says: 10475
>
> I've also tried doing what ChatGPG suggested:
> Number of Huge Pages when shared_buffers is set to 1 GiB =
> shared_buffers / huge_page_size
>  = 1073741824 bytes / 2097152 bytes
>  = 512
> But that's also wrong compared to "postgres -C ..." (which said 542).
>
> Which formula can I use? It's OK for me for it to be slightly wrong
> compared to "postgres -C", but if it's wrong, it needs to be slightly
> higher than what "postgres -C" outputs, so that I'm sure there's enough
> huge pages for Postgres to be able to use them properly.
>

Good morning Troels,

I had a similar thread a couple of years ago, you may want to read:

https://www.postgresql.org/message-id/flat/CAHJZqBBLHFNs6it-fcJ6LEUXeC5t73soR3h50zUSFpg7894qfQ%40mail.gmail.com

In it, Justin Przby provides the detailed code for exactly what factors
into HugePages, if you require that level of precision.

I hadn't seen that Cybertec blog post before. I ended up using my own
equation that I derived in that thread after Justin shared his info. The
chef/ruby code involved is:

padding = 100
if shared_buffers_size > 4
  padding = 500
end
shared_buffers_usage = shared_buffers_size + 200 + (25 *
shared_buffers_size / 1024)
max_connections_usage = (max_connections - 100) / 20
wal_buffers_usage = (wal_buffers_size - 16) / 2
vm.nr_hugepages = ((shared_buffers_usage + max_connections_usage +
wal_buffers_usage + padding) / 2).ceil()

wal_buffers_size is usually 16MB so wal_buffers_usage ends up being zeroed
out. This has worked out for our various postgres VM sizes. There's
obviously going to be a little extra HugePages that goes unused, but these
VMs are dedicated for postgresql usage and shared_buffers_size defaults to
25% of VM memory so there's still plenty to spare. But we use this so we
can configure vm.nr_hugepages at deployment time via Chef.

Don.

-- 
Don Seiler
www.seiler.us

Logical Replication vs. Free Replication Slots

2023-10-03 Thread Don Seiler

Good morning,

I'm trying to set up native logical replication from PG 12.15 (Ubuntu
18.04) to 15.4 (Ubuntu 22.04). I wanted to set up 3 publications to split
up the work. I initially created the three publications with a handful of
tables each. On the source (PG 12.15) instance, we have bumped
max_replication_slots and max_wal_senders to 50, and
max_sync_workers_per_subscription to 10.

When I create the subscriptions to start the initial table sync, the third
one failed to enable, with this error:

2023-09-29 21:07:45.861 UTC [1853362] ERROR: could not find free
replication state slot for replication origin with ID 11 2023-09-29
21:07:45.861 UTC [1853362] HINT: Increase max_replication_slots and try
again. 2023-09-29 21:07:45.861 UTC [1853362] LOG: subscription "sub03" has
been disabled because of an error

However when I look at the publication side, there are only 20 replication
slots created (out of 50 max), 3 of which are the subscriptions and the
rest are the tablesyncs workers.

So I'm trying to make sense of why it would tell me to increase
max_replication_slots when I don't appear to be anywhere near the max.
-- 
Don Seiler
www.seiler.us

Re: Logical Replication vs. Free Replication Slots

2023-10-03 Thread Don Seiler

On Tue, Oct 3, 2023 at 10:27 AM Don Seiler  wrote:

> On the source (PG 12.15) instance, we have bumped max_replication_slots
> and max_wal_senders to 50, and max_sync_workers_per_subscription to 10.
>

Forgot to note that on the subscriber (PG 15.4) instance,
max_sync_workers_per_subscription is 4, and max_logical_replication_workers
is 20. Per the docs, max_sync_workers_per_subscription would have no effect
on the publisher side.

Don.
-- 
Don Seiler
www.seiler.us

malloc errors in psql with libpq 16.1 on Mac

2023-11-27 Thread Don Seiler

I have a colleague trying to use psql to connect to a remote DB, but he's
getting this error:

FATAL: no PostgreSQL user name specified in startup packet

psql(42705,0x7ff84a07b700) malloc: *** error for object 0xa6: pointer being
freed was not allocated
psql(42705,0x7ff84a07b700) malloc: *** set a breakpoint in
malloc_error_break to debug

He was using psql 14 and we upgraded him to psql 15 but no change. I
noticed he was on libpq 16.1 (installed via homebrew). Another colleague
who was able to connect fine was on libpq 15.3. He upgraded to libpq 16.1
and boom he now gets the same error.

One colleague is on an older amd64 Mac, the other is on a newer arm64 Mac.
They are working to try to pin the older version of libpq but this is
probably going to bust out to a larger group as other homebrew users run
their routine updates.

Just wondering if anyone else has had similar experiences or if this is a
known issue?

-- 
Don Seiler
www.seiler.us

Re: malloc errors in psql with libpq 16.1 on Mac

2023-11-27 Thread Don Seiler

On Mon, Nov 27, 2023 at 6:02 PM Tom Lane  wrote:

>
> Is OpenSSL in use?  If so, what version?  I'm wondering if this is a
> variant of the OpenSSL 3.2.0 compatibility problem we just heard of:
>

My colleague does confirm it is OpenSSL 3.2.0:

==> openssl@3: stable 3.2.0 (bottled)

Does this mean that downgrading libpq wouldn't help? He'd have to downgrade
openssl instead (or wait for a fix from somewhere upstream)?

Don.
-- 
Don Seiler
www.seiler.us

Use of max_slot_wal_keep_size parameter

2024-03-26 Thread Don Seiler

Good morning,

With our recent upgrade to PG 15, we've put the max_slot_wal_keep_size into
place, setting it about 3% lower than the size of the volume dedicated to
pg_wal. However last night we had an incident where the volume filled up
while we were performing a massive insert from one table into another.

I don't believe the replica was lagging, and we have pgbackrest configured
for async archiving, which I assume was fine. It seemed to just be a matter
of PG not cleaning up the WAL. Our understanding was that
max_slot_wal_keep_size would prevent this but perhaps it only deals with
situations where the replication slot lag is a factor?

Don.

-- 
Don Seiler
www.seiler.us

Re: Use of max_slot_wal_keep_size parameter

2024-03-26 Thread Don Seiler

On Tue, Mar 26, 2024 at 9:09 AM Tom Lane  wrote:

> My immediate reaction is that 3% is a mighty small margin for error.
> I don't know exactly how max_slot_wal_keep_size is enforced these
> days, but in the past restrictions like that were implemented by
> deciding during a checkpoint whether to unlink a no-longer-needed WAL
> file (if we had too much WAL) or rename/recycle it to become a future
> WAL segment (if not).  So you could overshoot the specified target by
> more or less the amount of WAL that could be emitted between two
> checkpoints.  Perhaps it's tighter nowadays, but I really doubt that
> it's exact-to-the-kilobyte-at-all-times.
>

In this case, the total volume size was 60GB and we had the parameter set
to 58GB but I imagine that can still be overwhelmed quickly. Maybe we
should target a 20% buffer zone? We have wal_keep_size defaulted at 0.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Dangerous Naming Confusion

2021-03-29 Thread Don Seiler

Good evening,

Please see my gist at
https://gist.github.com/dtseiler/9ef0a5e2b1e0efc6a13d5661436d4056 for a
complete test case.

I tested this on PG 12.6 and 13.2 and observed the same on both.

We were expecting the queries that use dts_temp to only return 3 rows.
However the subquery starting at line 36 returns ALL 250,000 rows from
dts_orders. Note that the "order_id" field doesn't exist in the dts_temp
table, so I'm assuming PG is using the "order_id" field from the dts_orders
table. If I use explicit table references like in the query at line 48,
then I get the error I would expect that the "order_id" column doesn't
exist in dts_temp.

When I use the actual column name "a" for dts_temp, then I get the 3 rows
back as expected.

I'm wondering if this is expected behavior that PG uses the
dts_orders.order_id value in the subquery "select order_id from dts_temp"
when dts_temp doesn't have its own order_id column. I would have expected
an error that the column doesn't exist. Seems very counter-intuitive to
think PG would use a column from a different table.

This issue was discovered today when this logic was used in an UPDATE and
ended up locking all rows in a 5M row table and brought many apps to a
grinding halt. Thankfully it was caught and killed before it actually
updated anything.

Thanks,
Don.
-- 
Don Seiler
www.seiler.us

Re: Dangerous Naming Confusion

2021-03-30 Thread Don Seiler

On Mon, Mar 29, 2021 at 5:22 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Mon, Mar 29, 2021 at 3:20 PM Adrian Klaver 
> wrote:
>
>> On 3/29/21 3:00 PM, Don Seiler wrote:
>> >
>> > I'm wondering if this is expected behavior that PG uses the
>> > dts_orders.order_id value in the subquery "select order_id from
>> > dts_temp" when dts_temp doesn't have its own order_id column. I would
>> > have expected an error that the column doesn't exist. Seems very
>> > counter-intuitive to think PG would use a column from a different table.
>>
>> See:
>>
>>
>> https://www.postgresql.org/message-id/pine.lnx.4.56.0308011345320@krusty.credativ.de
>>
>>
> There is also an FAQ entry:
>
>
> https://wiki.postgresql.org/wiki/FAQ#Why_doesn.27t_PostgreSQL_report_a_column_not_found_error_when_using_the_wrong_name_in_a_subquery.3F
>
> David J.
>
>
This is good to know. I figured it might "working as expected". Still seems
annoying/counter-intuitive to me but at least we know to look for it.

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Could not read block in file

2018-01-18 Thread Don Seiler

This morning I got some errors from our pre-prod environment. This is
running Pg 9.6.6 on CentOS 7.3.

Just grepping out these errors (and obfuscating sensitive data):
2018-01-18 06:29:21 CST [11912]: [1570-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 39 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:29:42 CST [11959]: [517-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 40 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:30:09 CST [11912]: [1577-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 41 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:30:32 CST [11970]: [162-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 42 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:30:40 CST [11912]: [1583-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 43 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:31:04 CST [11970]: [168-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 44 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:31:31 CST [11959]: [595-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 45 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes

All of these errors occurred during attempts to insert into this liferay
table. For example:

2018-01-18 06:29:21 CST [11912]: [1568-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com LOG:  execute S_14: INSERT INTO
quartz_FIRED_TRIGGERS (ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE,
INSTANCE_NAME, FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL,
REQUESTS_RECOVERY, PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9,
$10, $11, $12)
2018-01-18 06:29:21 CST [11912]: [1569-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com DETAIL:  parameters: $1 = 'FOO', $2 = 'Bar
CSVTrigger', $3 = 'DEFAULT', $4 = 'f', $5 = 'FOO', $6 = '123', $7 = 'BLAH',
$8 = NULL, $9 = NULL, $10 = 'f', $11 = 'f', $12 = '5'
2018-01-18 06:29:21 CST [11912]: [1570-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com ERROR:  could not read block 39 in file
"pg_tblspc/16406/PG_9.6_201608131/16416/25690536": read only 0 of 8192 bytes
2018-01-18 06:29:21 CST [11912]: [1571-1]
db=abcprod,user=abcreporting2,app=PostgreSQL JDBC Driver,client=
abcchizapp1.mycompanyname.com STATEMENT:  INSERT INTO quartz_FIRED_TRIGGERS
(ENTRY_ID, TRIGGER_NAME, TRIGGER_GROUP, IS_VOLATILE, INSTANCE_NAME,
FIRED_TIME, STATE, JOB_NAME, JOB_GROUP, IS_STATEFUL, REQUESTS_RECOVERY,
PRIORITY) VALUES($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12)

These are the only occurrences of these errors. I've since been able to
query this table as well as take a pg_dump of the schemas on this
tablespace, with no errors either time. So I can't seem to duplicate this
problem. The app has since successfully performed many, many inserts into
that table since these errors.

I don't see any errors in /var/log/messages that would incidate any
filesystem issues either. Obviously the worry is for possible corruption.
Granted this is "only" pre-prod but there will be concerns from the
business side prior to upgrading our prod DB (currently 9.2.22).

What else can/should I check here to make sure there isn't something wrong
with this database cluster?

-- 
Don Seiler
www.seiler.us

vacuumdb --all Parallel Feature Request

2018-02-15 Thread Don Seiler

Good afternoon folks.

I've been playing around with some vacuumdb options this week as part of
post-upgrade testing, in particular with parallel (--jobs=N). I noticed
that vacuumdb --all will only work on one database at a time. This means
that as it winds down to the last few tables in a particular database,
workers will be idle. For example, I ran with 8 jobs, but while 1 job runs
on the last big table of a particular database, the other 7 are doing
nothing.

It would be great if those idle jobs would be able to move on to the next
database to get a head start on the tables there. When you have big tables
in multiple databases in the cluster, it can make a big difference.

Also, running analyze_new_cluster.sh mentions this:

If you would like default statistics as quickly as possible, cancel
> this script and run:
> "/usr/pgsql-9.6/bin/vacuumdb" --all --analyze-only


I wonder if it wouldn't be a bad idea to also mention the --jobs=N
parameter option in that blurb. Yes it's in the --help text but it wouldn't
be bad to highlight its availability.

Don.

-- 
Don Seiler
www.seiler.us

pgBackRest backup from standby

2018-02-18 Thread Don Seiler

Evening all.

Looking to use pgBackRest to take a backup from a hot standby. I'm reading
that pgBackRest still needs to connect to the primary and copy some files.
My questions are:


   1. What files does it need to copy? Config files? WAL files?
   2. How does it connect? SSH?
   3. Does pgBackRest need to be installed and configured on the primary as
   well?

Thanks,
Don.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-18 Thread Don Seiler

On Sun, Feb 18, 2018 at 6:43 PM, Michael Paquier 
wrote:
>
> I am adding in CC: Stephen Frost and David Steele who work on the took.
>

I assumed Stephen was already on this list, and I communicate with him
regularly on Slack as well but just throwing this out there on a Sunday
night.

> You may want to contact the maintainers directly through github where
> the project is maintained:
> https://github.com/pgbackrest/pgbackrest

Is that the place to just ask questions? I wasn't sure that "Issues" meant
just questions versus problems or bugs. I didn't see any mention of a forum
or list on their website, and there have been lots of pgBackRest questions
on this list in the past so I settled on this one.

Don.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 8:18 AM, David Steele  wrote:

> It copies files that are not replicated from the primary so that a
> primary-style backup is created. Anything that is replicated (which is
> by far the bulk of the data) is copied from the standby.
>

OK so all data files would be copied from standby. Can you give me an
example of the types of files that need to be copied from primary?

> it's best to archive from the primary so a replication
> failure does not affect your archiving.
>

Understood, just not something I can change in production primary at the
moment. Hence looking to see about a quick one-off backup from standby.

> Configuring pgBackRest, SSH, standby, and backup from standby are all
> covered in the user guide.

Thanks, I've been through it a few times and played with some test backups
from primary clones. I just ditched my master/replica clone setup but I'll
test there as well. I just had a couple questions about the mechanics.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 8:23 AM, David Steele  wrote:

>
> Either is fine with me, but as Michael says I might miss postings to
> -general.  I'm sure somebody else would catch it, though.
>

OK, I'll make use of the issues tracker going forward.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 8:53 AM, David Steele  wrote:

> Anything *not* in global (except pg_control), base, pg_tblspc,
> pg_xact/pg_clog, and pg_multixact are copied from the primary.
>
> For example, pg_stat is copied from the primary so these stats are
> preserved on a standby backup.
>

So if I have tablespaces outside of $PGDATA (but symlinked from within
pg_tblspc, of course), those will still be backed up from the standby,
right?

Is it right to say that the files that would be copied from primary are
very small, typically? So it isn't a huge transfer over the WAN (in my
case)?

> pgBackRest uses all the same exclusions as pg_basebackup, so many
> dirs/files are not copied at all: pg_dynshmem, pg_notify, pg_replslot,
> pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans, etc.
>
> Full list here
> https://www.postgresql.org/docs/10/static/protocol-replication.html.
>
> > it's best to archive from the primary so a replication
> > failure does not affect your archiving.
> >
> > Understood, just not something I can change in production primary at the
> > moment. Hence looking to see about a quick one-off backup from standby.
>
> For a quick one-off, pg_basebackup is your friend.

One of the requirements of this backup is encryption, which I don't see any
notes for with pg_basebackup. Also due to the size, parallel workers.
pgBackRest gives me both of these. I need compression as well but that
pg_basebackup does do.

I did come up with a sort of Rube Goldberg-esque workaround for now
involving using a clone of the prod standby VM from Veeam backup to use as
the backup source (after stopping recovery and opening it as a standalone
DB).

Don.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 9:21 AM, David Steele  wrote:

>
> Yes, they are typically very small.  The general exception to this rule
> is if logs are stored in pg_log.  I recommend storing logs out of the
> PGDATA dir as they can be quite large and don't really make sense to
> restore to another server.
>
> Files copied from the master will be marked as such in backup.manifest
> (master:true) so you can check for yourself.
>

Good to know. And fortunately for this DB we do have pg_log (and pg_xlog)
symlinked to different volumes outside of $PGDATA.

> I did come up with a sort of Rube Goldberg-esque workaround for now
> > involving using a clone of the prod standby VM from Veeam backup to use
> > as the backup source (after stopping recovery and opening it as a
> > standalone DB).
>
> You don't get PITR that way, of course, but at least it's a backup.  As
> long as your clone is consistent.

Yes it's a crash-consistent snapshot-based backup. I've done quite a few
restores from it and it works great. It can do PITR as well since I would
have all the WAL files from prod needed to keep recovering. But for these
cases I just recover it to the first consistent point and open it for
testing (or backups in this case).

Thanks for all your help!

Don.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 12:39 PM, David Steele  wrote:
>
>
> I read "open it for testing (or backups in this case)" as letting
> recovery complete and promoting the cluster to a master before taking
> the backup.
>
> Don, is that the case?  If it is, I think there's a problem with or
> without a timeline switch.  If you confirm the backup is being taken as
> above then I'll detail my concerns.
>

Note that this is just for creating a couple of one-off backups to restore
for our dev and pre-prod environments. Given that, I was going to open a
new clone as its own cluster and take backups from that. The data would be
the same though and suit purposes of the dev and pre-prod refreshes.

If I were taking backups for the purpose of production backups, I would not
do things this way. That is the eventual plan but right now we aren't ready
to make the changes necessary in the production environment.

-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 10:17 AM, David Steele  wrote:

> If pg_log is symlinked to PGDATA it will be copied.  pg_xlog is not
> copied in any backup.

So an external pg_log directory symlinked into $PGDATA will have its log
contents copied?

I'm curious, why even copy the pg_log logs? They aren't needed for database
restore or recovery.

Don.
-- 
Don Seiler
www.seiler.us

Re: pgBackRest backup from standby

2018-02-19 Thread Don Seiler

On Mon, Feb 19, 2018 at 2:53 PM, David Steele  wrote:

>
> Also, relocating the log directory is easy using the log_directory
> setting, so that's what I recommend if it's an issue.  Some users do
> want to backup their logs.
>
>
That's probably a lot better idea than symlinking anyway. I'll look to do
that in my next round of config changes.



-- 
Don Seiler
www.seiler.us

Re: Index Partition Size Double of its Table Partition?

2024-10-30 Thread Don Seiler

On Wed, Oct 30, 2024 at 11:23 AM Peter Geoghegan  wrote:

>
> If a substantial amount of the index was written by CREATE INDEX (and
> not by retail inserts) then my theory is unlikely to be correct. It
> could just be that you managed to absorb most inserts in one
> partition, but not in the other. That's probably possible when there
> are only relatively small differences in the number of inserts that
> need to use of the space left behind by fillfactor in each case. In
> general page splits tend to come in distinct "waves" after CREATE
> INDEX is run.
>

What do you mean by "absorb" the inserts?

It sounds like the answer will be "No", but: Would rebuilding the index
after the month-end (when inserts have stopped on this partition) change
anything?

Don.
-- 
Don Seiler
www.seiler.us

Index Partition Size Double of its Table Partition?

2024-10-30 Thread Don Seiler

We're trying out a new non-unique covering (including) index on a couple of
table partitions. We put the index on partitions for last month and this
month. Both table partitions have similar sizes (45-46 GB) and row counts
(330-333 million). The covering index on last month's partition is 50GB,
but this month's index is 79GB already. The table is basically write-only
as well. So there shouldn't be any real bloat here.

One thing worth mentioning is that the table is 4 columns, the index is on
two of them and includes the other two. I can't think of an explanation for
the index being so much larger than its table, especially compared to last
month's index.

Curious if anyone has any thoughts on what might be causing this.
-- 
Don Seiler
www.seiler.us

Re: Index Partition Size Double of its Table Partition?

2024-10-30 Thread Don Seiler

On Wed, Oct 30, 2024 at 10:45 AM Peter Geoghegan  wrote:

>
> It sounds like you have no updates and deletes. Right? So the only
> thing that could be different is the way that the pages are being
> split (aside from variations in the width of index tuples, which seems
> highly unlikely to be the only factor).
>

Correct, the table only sees inserts as far as DML goes.

> The heuristics used to trigger the relevant behavior are fairly
> conservative. I wonder if we should be more aggressive about it.
>
> > I should have also included that this is on PG 15 (currently 15.8 but we
> created the indexes when it was still 15.7) on Ubuntu 22.04 LTS.
>
> That shouldn't matter, as far as this theory of mine is concerned.
> Anything after 12 could be affected by the issue I'm thinking of.
>

Why would last month's index be so much smaller?

Both indexes were created using CONCURRENTLY, as each was created during
its month when we started testing. The September index was created toward
the end of the month (Sep 26), whereas the October one was created Oct 1.
Both table partitions are getting regularly autovacuum/autoanalyze work.

Don.
-- 
Don Seiler
www.seiler.us

Re: Index Partition Size Double of its Table Partition?

2024-10-30 Thread Don Seiler

On Wed, Oct 30, 2024 at 10:35 AM Peter Geoghegan  wrote:

> On Wed, Oct 30, 2024 at 11:24 AM Don Seiler  wrote:
> > One thing worth mentioning is that the table is 4 columns, the index is
> on two of them and includes the other two. I can't think of an explanation
> for the index being so much larger than its table, especially compared to
> last month's index.
> >
> > Curious if anyone has any thoughts on what might be causing this.
>
> You mentioned that this has 4 columns. Sounds like this could be a
> "locally monotonically increasing index". I wonder if you're
> benefiting from this optimization, though only inconsistently:
>
> https://www.youtube.com/watch?v=p5RaATILoiE&t=2079s
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f21668f3
>
> Of course, this is only a guess. I vaguely recall a complaint that
> sounded vaguely like yours, also involving partitioning.
>

Thanks Peter, I'll look into that shortly.

I should have also included that this is on PG 15 (currently 15.8 but we
created the indexes when it was still 15.7) on Ubuntu 22.04 LTS.

Don.

-- 
Don Seiler
www.seiler.us

Re: Index Partition Size Double of its Table Partition?

2024-10-31 Thread Don Seiler

On Wed, Oct 30, 2024 at 4:59 PM David Mullineux  wrote:

> Are you able to cluster the table ? The idea is that rows ordered in the
> same way  as the index might reduce it's size ?
>
I'm not sure on this. There are other indexes on these table partitions as
well.

Another bit of useful info that I should have shared immediately is that
this is a monthly partitioned table, going back years. We don't drop old
partitions (yet) on this one. For now we've only added this index to a few
individual partitions. The hope was to add it to all of them and then
eventually the template (using an older version of pg_partman).

Don.
-- 
Don Seiler
www.seiler.us

Re: Index Partition Size Double of its Table Partition?

2024-11-06 Thread Don Seiler

On Thu, Oct 31, 2024 at 11:13 AM Rob Sargent  wrote:

> Whats the fill factor?
>

No fill factor is specified, so I'm assuming it's the default 90% for
indexes.

FYI we did a REINDEX for the index in question tonight. Since the index was
for last month, there are no more writes to it so we didn't use
CONCURRENTLY either. The size went from 83GB to 48GB, which also happens to
be the size of the table partition.

-- 
Don Seiler
www.seiler.us

69 matches

Mail list logo