is pg_advisory_lock() suitable for long runs

2018-03-31 Thread Radoslav Nedyalkov
Hi all,
it's very simple and intuitive case but let me describe first.
1. session 1 calls pg_advisory_lock(1234) and succeeds.
2. session 2 calls pg_advisory_lock(1234) and stops on waiting.
All fine BUT pid for session2 appears already with backend_xmin in
pg_stat_activity
which means vacuum won't be able to remove rows younger than session2
backend_xmin.

Well, we planned to use pg_advisory_lock() as a boot phase in a hot-standby
appserver
and apparently this will be problematic as the session2 might wait for
weeks.

Any thoughts ? Do we miss something ?

Thanks and Regards
Radoslav Nedyalkov


create index insist on 2 workers only

2020-04-24 Thread Radoslav Nedyalkov
Hello everybody,

We 're rebuilding a big table which has set parallel_workers = 6
system has
 max_parallel_maintenance_workers | 6   |
/var/lib/pgsql/11/data/postgresql.sumup.conf
 max_parallel_workers | 16  |
/var/lib/pgsql/11/data/postgresql.sumup.conf
 max_parallel_workers_per_gather  | 4   |
/var/lib/pgsql/11/data/postgresql.sumup.conf

Also session level on index restore there is
set max_parallel_maintenance_workers = 6;

Still we get only 2 parallel processes in a free of any other load system.
It is postgres 11.7

Where to look at ?

Thank You very much

Rado


Re: create index insist on 2 workers only

2020-05-12 Thread Radoslav Nedyalkov
Just to reply for completeness.
MWM was high enough (4GB) .
Apparently we were having max_worker_processes = 18 and 16 subscriptions
running, leaving no room for additional processes. What an oversight.
Once mwp was bumped (to 42 on 64 core machine), everything  started working
as expected.

On Sat, Apr 25, 2020 at 5:03 AM Peter Geoghegan  wrote:

> On Fri, Apr 24, 2020 at 7:32 AM Radoslav Nedyalkov 
> wrote:
> > We 're rebuilding a big table which has set parallel_workers = 6
> > system has
> >  max_parallel_maintenance_workers | 6   |
> /var/lib/pgsql/11/data/postgresql.sumup.conf
> >  max_parallel_workers | 16  |
> /var/lib/pgsql/11/data/postgresql.sumup.conf
> >  max_parallel_workers_per_gather  | 4   |
> /var/lib/pgsql/11/data/postgresql.sumup.conf
> >
> > Also session level on index restore there is
> > set max_parallel_maintenance_workers = 6;
> >
> > Still we get only 2 parallel processes in a free of any other load
> system.
> > It is postgres 11.7
>
> Try increasing maintenance_work_mem from the default of 64MB. MWM
> constrains the number of parallel workers used.
>
> --
> Peter Geoghegan
>


restart_lsn is not advancing

2020-06-30 Thread Radoslav Nedyalkov
Hello All,
we are in a situation where restart_lsn for logical replication slots is
not advancing.
We have two slots. Both  are active, confirmed_flush_lsn also is updated.
Client side all looks okay. remote_lsn for subscritions is advancing.

The effect of restart_lsn being 'stopped' is the disk is filling up.

Any idea what could be the cause?
We will try to re-establish replication by dropping current slots ,
although we have to do it very carefully. We should avoid initial loading
(these are 4T of data)

Thanks and regards,
Rado


Re: restart_lsn is not advancing

2020-06-30 Thread Radoslav Nedyalkov
one more detail:  pg_replication_slots.catalog_xmin shows the biggest age
in the system.
How could  transaction on catalogues may affect logical slot. Any clue
here ?
Thank you,
Rado

On Wed, Jul 1, 2020 at 2:09 AM Radoslav Nedyalkov 
wrote:

> Hello All,
> we are in a situation where restart_lsn for logical replication slots is
> not advancing.
> We have two slots. Both  are active, confirmed_flush_lsn also is updated.
> Client side all looks okay. remote_lsn for subscritions is advancing.
>
> The effect of restart_lsn being 'stopped' is the disk is filling up.
>
> Any idea what could be the cause?
> We will try to re-establish replication by dropping current slots ,
> although we have to do it very carefully. We should avoid initial loading
> (these are 4T of data)
>
> Thanks and regards,
> Rado
>


single table - fighting a seq scan

2020-07-14 Thread Radoslav Nedyalkov
Hi Forum,
I'm scratching my head around the following case:

*te* is a 80M rows, 100GB table. It is a bare simple select over indexed
attribute of it.

EXPLAIN SELECT te.id FROM te WHERE te.current_pid IN (240900026,
 240900027,
 240900028,
 -- 200 entries ...

 Gather  (cost=1000.00..61517367.85 rows=3870 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on te  (cost=0.00..61515980.85 rows=1612 width=8)
 Filter: (current_pid = ANY
('{240900026,240900027,...240901129}'::bigint[]))
Execution time is about 5 minutes

Reducing number of current_pids to 100 changes the plan and it does index
scan. (101 still does seq scan)

 Index Scan using te_current_pid_idx on te  (cost=0.57..731.26 rows=3832
width=8) (actual time=0.566..1.667 rows=600 loops=1)
   Index Cond: (current_pid = ANY
('{240900026,240900027,...240900194}'::bigint[]))
 Planning Time: 3.152 ms
 Execution Time: 1.732 ms

Selecting 200 pids rewritten with CTE goes for index too.

EXPLAIN ANALYZE
WITH cte as (
select * from unnest(ARRAY[
240900026,
 240900027,
 240900028,
...
 240901129
]))
SELECT te.id FROM te join cte on te.current_pid = cte.unnest;


   QUERY PLAN
-
 Nested Loop  (cost=1.58..1097.83 rows=3847 width=8) (actual
time=0.882..14.927 rows=1468 loops=1)
   CTE cte
 ->  Function Scan on unnest  (cost=0.00..1.00 rows=100 width=4)
(actual time=0.025..0.043 rows=205 loops=1)
   ->  CTE Scan on cte  (cost=0.00..2.00 rows=100 width=4) (actual
time=0.027..0.083 rows=205 loops=1)
   ->  Index Scan using te_current_pid_idx on te  (cost=0.57..10.57 rows=38
width=16) (actual time=0.011..0.071 rows=7 loops=205)
 Index Cond: (current_pid = cte.unnest)
 Planning Time: 2.022 ms
 Execution Time: 15.044 ms


I tried random_page_cost=1, a couple of combinations with very low
cpu_index_tuple_cost and cpu_operator_cost. Only managed to get an index
scan for a few more IN entries.
Did analyze. Bumped stats target for current_pid to 5000. Did not help.

I'm out of ideas. What is the right approach to solve this ?
Thank You!

Rado


Re: single table - fighting a seq scan

2020-07-14 Thread Radoslav Nedyalkov
Ah, I could have messed up the examples I gave. Row numbers are different.
Once again the plans , sorry about that.

-- 200 entries

 Gather  (cost=1000.00..106905910.97 rows=7893 width=8)
   Workers Planned: 2
   ->  Parallel Seq Scan on te  (cost=0.00..106904121.67 rows=3289 width=8)
 Filter: (current_pid = ANY
('{240900026,240900027,240900028,240900029,240900030,240900031,240900032,240900033,240900034,240900035,240900036,240900037,240900038,240900039,240900040,240900041,240900042,240900043,240900044,240900045,240900046,240900047,240900048,240900049,240900050,240900051,240900052,240900053,240900054,240900055,240900056,240900057,240900058,240900059,240900060,240900061,240900062,240900063,240900064,240900065,240900066,240900067,240900068,240900069,240900070,240900071,240900072,240900073,240900074,240900075,240900076,240900077,240900078,240900079,240900080,240900081,240900082,240900083,240900084,240900085,240900086,240900165,240900087,240900166,240900088,240900167,240900089,240900168,240900090,240900169,240900091,240900170,240900092,240900171,240900093,240900172,240900094,240900173,240900905,240900174,240900175,240900176,240900177,240900178,240900179,240900180,240900181,240900182,240900183,240900184,240900185,240900186,240900187,240900188,240900189,240900190,240900191,240900192,240900193,240900194,240900195,240900196,240900197,240900198,240900199,240900906,240900907,240900908,240900909,240900910,240900911,240900912,240900913,240900914,240900915,240900916,240900917,240900918,240900919,240900920,240900921,240900922,240900923,240900924,240900925,240900926,240900927,240900928,240901048,240901053,240901054,240901055,240901056,240901057,240901058,240901059,240901060,240901061,240901062,240901063,240901064,240901065,240901066,240901067,240901068,240901069,240901070,240901071,240901072,240901073,240901074,240901075,240901076,240901077,240901078,240901079,240901080,240901081,240901082,240901083,240901084,240901085,240901086,240901087,240901088,240901089,240901090,240901091,240901092,240901093,240901094,240901095,240901096,240901097,240901098,240901099,240901100,240901101,240901102,240901103,240901104,240901105,240901106,240901107,240901108,240901109,240901110,24090,240901112,240901113,240901114,240901115,240901116,240901117,240901118,240901119,240901120,240901121,240901122,240901123,240901124,240901125,240901126,240901127,240901128,240901129}'::bigint[]))
(4 rows)

Time: 5.261 ms




-- 100 entries
--

 Index Scan using te_current_pid_idx on te  (cost=0.57..731.26 rows=3832
width=8) (actual time=1.244..15.897 rows=600 loops=1)
   Index Cond: (current_pid = ANY
('{240900026,240900027,240900028,240900029,240900030,240900031,240900032,240900033,240900034,240900035,240900036,240900037,240900038,240900039,240900040,240900041,240900042,240900043,240900044,240900045,240900046,240900047,240900048,240900049,240900050,240900051,240900052,240900053,240900054,240900055,240900056,240900057,240900058,240900059,240900060,240900061,240900062,240900063,240900064,240900065,240900066,240900067,240900068,240900069,240900070,240900071,240900072,240900073,240900074,240900075,240900076,240900077,240900078,240900079,240900080,240900081,240900082,240900083,240900084,240900085,240900086,240900165,240900087,240900166,240900088,240900167,240900089,240900168,240900090,240900169,240900091,240900170,240900092,240900171,240900093,240900172,240900094,240900173,240900905,240900174,240900175,240900176,240900177,240900178,240900179,240900180,240900181,240900182,240900183,240900184,240900185,240900186,240900187,240900188,240900189,240900190,240900191,240900192,240900193,240900194}'::bigint[]))
 Planning Time: 3.430 ms
 Execution Time: 15.954 ms
(4 rows)

Time: 20.257 ms


On Tue, Jul 14, 2020 at 10:31 PM Radoslav Nedyalkov 
wrote:

> Hi Michael,
> full output from the query is attached.
> her

Re: single table - fighting a seq scan

2020-07-15 Thread Radoslav Nedyalkov
Shame on me. It's a partial index - *where is not null.*
Put the* is not null *predicate in place and planner always goes for index.
(tested with thousands of IN entries)
CTE version always goes for index too even *without **is not null , *which
led to a slight confusion.

Thanks Tom, Michael,
Best
Rado

On Wed, Jul 15, 2020 at 1:06 AM Tom Lane  wrote:

> Radoslav Nedyalkov  writes:
> > Ah, I could have messed up the examples I gave. Row numbers are
> different.
> > Once again the plans , sorry about that.
>
> Given that it works at 100 entries and not 101, I can't escape the
> suspicion that you're being burnt by predtest.c's MAX_SAOP_ARRAY_SIZE
> limit.  However, that only affects the planner's willingness to make
> constraint proofs involving the large IN clause, and nothing you've
> mentioned here explains why such a proof would be needed.  Is there
> something you're not telling us about this table's schema?  (I'm
> wondering if the index is partial, for instance, though one would
> think that the CTE form of the query wouldn't work either if so.)
>
> regards, tom lane
>


Could not open file pg_xact/0E97

2020-07-18 Thread Radoslav Nedyalkov
Hi Forum,
We see
pg_dump: Dumping the contents of table "transactions_and_fees_2020_01"
failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR:  could not access status of
transaction 3916900817
DETAIL:  Could not open file "pg_xact/0E97": No such file or directory.
(pg11.8 on centos 7.8, aws ec2 instance)

This pops up upon select from the table on the standby server.
However on the master server, we can dump the table without problem.

What is interesting, when listing the pg_xact folder, according to the
filename , it is from already recycled files. I.e. it is older than the
oldest file there.

For resolving the error , we  created commit file on the standby like
dd if=/dev/zero of=0E97 bs=256k count=1
and the full dump of the database went successfully.

Perhaps this is not quite the correct approach, so we're gonna to run
vacuum full on the master
to recreate the table.

Has anyone encountered this issue? Any advice?
Thank you

Rado


Re: Could not open file pg_xact/0E97

2020-07-18 Thread Radoslav Nedyalkov
Well. the vacuum full failed with

vacuumdb: vacuuming of table "olap.transactions_and_fees_2020_01" in
database "db" failed: ERROR:  found xmin 3916900817 from before
relfrozenxid 80319533

On Sun, Jul 19, 2020 at 12:01 AM Radoslav Nedyalkov 
wrote:

> Hi Forum,
> We see
> pg_dump: Dumping the contents of table "transactions_and_fees_2020_01"
> failed: PQgetResult() failed.
> pg_dump: Error message from server: ERROR:  could not access status of
> transaction 3916900817
> DETAIL:  Could not open file "pg_xact/0E97": No such file or directory.
> (pg11.8 on centos 7.8, aws ec2 instance)
>
> This pops up upon select from the table on the standby server.
> However on the master server, we can dump the table without problem.
>
> What is interesting, when listing the pg_xact folder, according to the
> filename , it is from already recycled files. I.e. it is older than the
> oldest file there.
>
> For resolving the error , we  created commit file on the standby like
> dd if=/dev/zero of=0E97 bs=256k count=1
> and the full dump of the database went successfully.
>
> Perhaps this is not quite the correct approach, so we're gonna to run
> vacuum full on the master
> to recreate the table.
>
> Has anyone encountered this issue? Any advice?
> Thank you
>
> Rado
>


Re: Could not open file pg_xact/0E97

2020-07-21 Thread Radoslav Nedyalkov
Nope. It's an intensive OLAP data-warehouse with queries hanging in waits
for hours.
>From the vacuum full error and commit file which is gone due to the
supposedly moved age mark,
It looks more like a bug and not an IO error.

Rado

On Tue, Jul 21, 2020 at 2:38 AM Jeremy Schneider 
wrote:

>
> On Jul 18, 2020, at 14:18, Radoslav Nedyalkov 
> wrote:
>
> 
> Well. the vacuum full failed with
>
> vacuumdb: vacuuming of table "olap.transactions_and_fees_2020_01" in
> database "db" failed: ERROR:  found xmin 3916900817 from before
> relfrozenxid 80319533
>
>>
> Do you have checksums enabled for this database?
>
> -Jeremy
>
> Sent from my TI-83
>
>


conflict with recovery when delay is gone

2020-11-13 Thread Radoslav Nedyalkov
Hi Forum,all
On a very busy master-standby setup which runs typical olap processing -
long living , massive writes statements,  we're getting on the standby:

 ERROR:  canceling statement due to conflict with recovery
 FATAL:  terminating connection due to conflict with recovery

The weird thing is that cancellations happen usually after standby has
experienced
some huge delay(2h), still not at the allowed maximum(3h). Even recently
run statements
got cancelled when the delay is already at zero.

Sometimes the situation got relaxed after an hour or so.
Restarting the server instantly helps.

It is pg11.8, centos7, hugepages, shared_buffers 196G from 748G.

What phenomenon could we be facing?

Thank you,
Rado


Re: conflict with recovery when delay is gone

2020-11-13 Thread Radoslav Nedyalkov
On Fri, Nov 13, 2020 at 7:37 PM Laurenz Albe 
wrote:

> On Fri, 2020-11-13 at 15:24 +0200, Radoslav Nedyalkov wrote:
> > On a very busy master-standby setup which runs typical olap processing -
> > long living , massive writes statements,  we're getting on the standby:
> >
> >  ERROR:  canceling statement due to conflict with recovery
> >  FATAL:  terminating connection due to conflict with recovery
> >
> > The weird thing is that cancellations happen usually after standby has
> experienced
> > some huge delay(2h), still not at the allowed maximum(3h). Even recently
> run statements
> > got cancelled when the delay is already at zero.
> >
> > Sometimes the situation got relaxed after an hour or so.
> > Restarting the server instantly helps.
> >
> > It is pg11.8, centos7, hugepages, shared_buffers 196G from 748G.
> >
> > What phenomenon could we be facing?
>
> Hard to say.  Perhaps an unusual kind of replication conflict?
>
> What is in "pg_stat_database_conflicts" on the standby server?
>

db01=# select * from pg_stat_database_conflicts;
 datid |  datname  | confl_tablespace | confl_lock | confl_snapshot |
confl_bufferpin | confl_deadlock
---+---+--+++-+
 13877 | template0 |0 |  0 |  0 |
0 |  0
 16400 | template1 |0 |  0 |  0 |
0 |  0
 16402 | postgres  |0 |  0 |  0 |
0 |  0
 16401 | db01  |0 |  0 | 51 |
0 |  0
(4 rows)

On a freshly restarted standby we've just got similar behaviour after a 2
hours delay and a slow catch-up.
confl_snapshots is 51 and we have exactly the same number cancelled
statements.


Re: conflict with recovery when delay is gone

2020-11-14 Thread Radoslav Nedyalkov
On Fri, Nov 13, 2020 at 8:13 PM Radoslav Nedyalkov 
wrote:

>
>
> On Fri, Nov 13, 2020 at 7:37 PM Laurenz Albe 
> wrote:
>
>> On Fri, 2020-11-13 at 15:24 +0200, Radoslav Nedyalkov wrote:
>> > On a very busy master-standby setup which runs typical olap processing -
>> > long living , massive writes statements,  we're getting on the standby:
>> >
>> >  ERROR:  canceling statement due to conflict with recovery
>> >  FATAL:  terminating connection due to conflict with recovery
>> >
>> > The weird thing is that cancellations happen usually after standby has
>> experienced
>> > some huge delay(2h), still not at the allowed maximum(3h). Even
>> recently run statements
>> > got cancelled when the delay is already at zero.
>> >
>> > Sometimes the situation got relaxed after an hour or so.
>> > Restarting the server instantly helps.
>> >
>> > It is pg11.8, centos7, hugepages, shared_buffers 196G from 748G.
>> >
>> > What phenomenon could we be facing?
>>
>> Hard to say.  Perhaps an unusual kind of replication conflict?
>>
>> What is in "pg_stat_database_conflicts" on the standby server?
>>
>
> db01=# select * from pg_stat_database_conflicts;
>  datid |  datname  | confl_tablespace | confl_lock | confl_snapshot |
> confl_bufferpin | confl_deadlock
>
> ---+---+--+++-+
>  13877 | template0 |0 |  0 |  0 |
>   0 |  0
>  16400 | template1 |0 |  0 |  0 |
>   0 |  0
>  16402 | postgres  |0 |  0 |  0 |
>   0 |  0
>  16401 | db01  |0 |  0 | 51 |
>   0 |  0
> (4 rows)
>
> On a freshly restarted standby we've just got similar behaviour after a 2
> hours delay and a slow catch-up.
> confl_snapshots is 51 and we have exactly the same number cancelled
> statements.
>
>
No luck so far. Searching for the explanation i found we fail into the
unexplained case when
snapshot conflicts happen even hot_standby_feedback is on.

Thanks,
Rado


Re: conflict with recovery when delay is gone

2020-11-15 Thread Radoslav Nedyalkov
On Sun, Nov 15, 2020 at 12:48 AM Mohamed Wael Khobalatte <
mkhobala...@grubhub.com> wrote:

>
>
> On Sat, Nov 14, 2020 at 2:46 PM Radoslav Nedyalkov 
> wrote:
>
>>
>>
>> On Fri, Nov 13, 2020 at 8:13 PM Radoslav Nedyalkov 
>> wrote:
>>
>>>
>>>
>>> On Fri, Nov 13, 2020 at 7:37 PM Laurenz Albe 
>>> wrote:
>>>
>>>> On Fri, 2020-11-13 at 15:24 +0200, Radoslav Nedyalkov wrote:
>>>> > On a very busy master-standby setup which runs typical olap
>>>> processing -
>>>> > long living , massive writes statements,  we're getting on the
>>>> standby:
>>>> >
>>>> >  ERROR:  canceling statement due to conflict with recovery
>>>> >  FATAL:  terminating connection due to conflict with recovery
>>>> >
>>>> > The weird thing is that cancellations happen usually after standby
>>>> has experienced
>>>> > some huge delay(2h), still not at the allowed maximum(3h). Even
>>>> recently run statements
>>>> > got cancelled when the delay is already at zero.
>>>> >
>>>> > Sometimes the situation got relaxed after an hour or so.
>>>> > Restarting the server instantly helps.
>>>> >
>>>> > It is pg11.8, centos7, hugepages, shared_buffers 196G from 748G.
>>>> >
>>>> > What phenomenon could we be facing?
>>>>
>>>> Hard to say.  Perhaps an unusual kind of replication conflict?
>>>>
>>>> What is in "pg_stat_database_conflicts" on the standby server?
>>>>
>>>
>>> db01=# select * from pg_stat_database_conflicts;
>>>  datid |  datname  | confl_tablespace | confl_lock | confl_snapshot |
>>> confl_bufferpin | confl_deadlock
>>>
>>> ---+---+--+++-+
>>>  13877 | template0 |0 |  0 |  0 |
>>> 0 |  0
>>>  16400 | template1 |0 |  0 |  0 |
>>> 0 |  0
>>>  16402 | postgres  |0 |  0 |  0 |
>>> 0 |  0
>>>  16401 | db01  |0 |  0 | 51 |
>>> 0 |  0
>>> (4 rows)
>>>
>>> On a freshly restarted standby we've just got similar behaviour after a
>>> 2 hours delay and a slow catch-up.
>>> confl_snapshots is 51 and we have exactly the same number cancelled
>>> statements.
>>>
>>>
>> No luck so far. Searching for the explanation i found we fail into the
>> unexplained case when
>> snapshot conflicts happen even hot_standby_feedback is on.
>>
>> Thanks,
>> Rado
>>
>>
>
> Perhaps you have a value set for old_snapshot_threshold? If not, do the
> walreceiver connections drop out?
>

old_snapshot_threshold is -1 on both master and replica.
walreceiver does not drop.


set column statistics to max does not help

2022-04-21 Thread Radoslav Nedyalkov
Hello all,
We're stuck at a wrong plan that the planner insists on.(pg 14.2)
It's an aggregation over a single table.
The planner always goes for an extra scan over a partial index.
We have set statistics on the relevant columns to the max of 1 and
could not get correct row estimates. None of the cost* settings helped too.
Disabling bitmapscan below brings up the correct plan. Do we have another
option here ?
Maybe it is a more general question of what to do when the statistics
samples cannot get the right numbers?

Thank you in advance for your help,

Rado


Here are the details:
See the partial index that matches one of the query where clauses :
"transaction_events_0100_0200_merchant_id_id_idx" btree (merchant_id, id)
WHERE (event_type::text = ANY (ARRAY['REFUND'::character varying::text,
'CHARGE_BACK'::character varying::text])) AND (current_status_id <> ALL
(ARRAY[24, 10]))

the distribution of event_type is
  count   | event_type
--+-
14908 | CHARGE_BACK
   134007 | REFUND
 99846581 | PAYOUT
(3 rows)

*** The table:
test=# \d+ transaction_events_0100_0200
   Table
"public.transaction_events_0100_0200"
   Column   |Type | Collation |
Nullable | Default | Storage  | Compression | Stats target | Description
+-+---+--+-+--+-+--+-
 id | bigint  |   | not
null | | plain| |  |
 transaction_id | bigint  |   |
 | | plain| | 1|
 event_type | character varying(255)  |   |
 | | extended | | 1|
 event_amount   | numeric(12,2)   |   |
 | | main | |  |
 current_status_id  | integer |   |
 | | plain| | 1|
 payout_due_date| date|   |
 | | plain| |  |
 actual_payout_date | date|   |
 | | plain| |  |
 current_payout_event_id| integer |   |
 | | plain| |  |
 created_at | timestamp without time zone |   | not
null | | plain| |  |
 updated_at | timestamp without time zone |   | not
null | | plain| |  |
 installment_number | integer |   |
 | | plain| |  |
 installments_count | integer |   |
 | | plain| |  |
 fixed_fee  | numeric(12,2)   |   |
 | | main | |  |
 acceleration_fee   | numeric(12,2)   |   |
 | | main | |  |
 processing_fee | numeric(12,2)   |   |
 | | main | |  |
 origin_event_id| bigint  |   |
 | | plain| |  |
 destination_event_id   | bigint  |   |
 | | plain| |  |
 payout_deduct_status   | character varying   |   |
 | | extended | |  |
 merchant_id| integer |   | not
null | | plain| | 1000 |
 current_merchant_payout_id | bigint  |   |
 | | plain| |  |
Indexes:
"transaction_events_0100_0200_pkey" PRIMARY KEY, btree (id)
"transaction_events_0100_0200_current_status_id_transaction__idx" btree
(current_status_id, transaction_id) WHERE current_status_id <> ALL
(ARRAY[24, 10])
"transaction_events_0100_0200_merchant_id_id_idx" btree (merchant_id,
id) WHERE (event_type::text = ANY (ARRAY['REFUND'::character varying::text,
'CHARGE_BACK'::character varying::text])) AND (current_status_id <> ALL
(ARRAY[24, 10]))
"transaction_events_0100_0200_merchant_id_transaction_id_idx" btree
(merchant_id, transaction_id) WHERE current_status_id <> ALL (ARRAY[24, 10])
"transaction_events_0100_0200_transaction_id_idx" btree
(transaction_id) WITH (fillfactor='100')
"transaction_events_0100_0200_transaction_id_idx1" btree
(transaction_id) WHERE event_type::text = 'CHARGE_BACK'::text AND
payout_deduct_status::text = 'PENDING'::text

*** The query:
SEL

partitioned tables in high OLTP db

2022-04-27 Thread Radoslav Nedyalkov
Hello,

I'd like to ask for your opinion and experience you might have with
partitioned tables on a production OLTP database.
We have a couple of tables reaching 1T and beyond and liked to have them
partitioned.
We built a pg14 full volume db.
To get an impression of the new performance we're executing most frequent
oltp queries
and comparing plans and execution times with the production "monolithic"
pg12 db.
For most of the queries we managed to get the optimal plans.
Execution times however increased , especially with planning time. Say from
1ms to 6ms/10ms
Typically there are joins between 3-6 partitioned (~40 partitions) tables.

That's the only thing we can examine . No load/stress tests are available.

Something else we should consider?  (I saw lock contention has been noted
in the forum.)

Thank you for your time.

Regards,
Rado


Initial load from standby in logical replication.

2019-01-31 Thread Radoslav Nedyalkov
Hi All,
We want to setup a logical replica with initial load from physical standby.
(all pg10.5)
Basically what we tested is:
1.on the master: create publication and logical slot.
2.on the standby: pause the stanby. (lsn from 1. is already replayed).
3.on the standby: get last replayed lsn, pg_last_wal_replay_lsn().
4.export/import the data into the logical replica.
5.on the logical replica: create subscription
with pg_replication_origin_advance'd to the lsn from 3.
6.on the logical replica: enable subscription.

On a test setup with some bulk inserts results are encouraging.
Still we're not that confident. Does the procedure looks okay?
What if standby is paused at COMMIT lsn?

Thanks!
Radoslav Nedyalkov


Re: Initial load from standby in logical replication.

2019-01-31 Thread Radoslav Nedyalkov
On Thu, Jan 31, 2019 at 4:09 PM Achilleas Mantzios <
ach...@matrix.gatewaynet.com> wrote:

> On 31/1/19 3:55 μ.μ., Radoslav Nedyalkov wrote:
> > Hi All,
> > We want to setup a logical replica with initial load from physical
> standby. (all pg10.5)
> > Basically what we tested is:
> > 1.on the master: create publication and logical slot.
> > 2.on the standby: pause the stanby. (lsn from 1. is already replayed).
> > 3.on the standby: get last replayed lsn, pg_last_wal_replay_lsn().
> > 4.export/import the data into the logical replica.
> > 5.on the logical replica: create subscription
> with pg_replication_origin_advance'd to the lsn from 3.
> > 6.on the logical replica: enable subscription.
> >
> > On a test setup with some bulk inserts results are encouraging.
> > Still we're not that confident. Does the procedure looks okay?
> > What if standby is paused at COMMIT lsn?
> Who is gonna consume data from the primary's replication slot? If there is
> no subscriber connected and reading data, then the primary will think that
> the subscriber is down and will accumulate WALs
> till some replication client is connected to it (which will not happen
> since the subscriber points to the standby).
>
Yes. that's clear. Primary will accumulate WALs until logical replica
connects to it.
The point is to offload the primary from massive initial copying.


> > Thanks!
> > Radoslav Nedyalkov
> >
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>
>


subscription broken after upgrade to pg11

2019-03-19 Thread Radoslav Nedyalkov
Hi All,
We're testing upgrade from postgres 10.6 to postgres 11.2 with pg_upgrade.
Before stopping pg10 we disabled subscription.
After upgrade and launching pg11,  the existing logical replication
subscription is there and disabled.

stest=# \dRs+
 List of subscriptions
 Name |  Owner   | Enabled | Publication | Synchronous commit |
Conninfo
--+--+-+-++-
 sub1 | postgres | f   | {pub2}  | off|
host=10.2.5.8 dbname=test user=postgres
(1 row)

However after enabling it,  the target table does not get updates,
although the subscription looks okay according to the status below.

stest=# alter subscription sub1 enable;
ALTER SUBSCRIPTION
# no updates here

stest=# \dRs+
List of subscriptions
 Name |  Owner   | Enabled | Publication | Synchronous commit |
Conninfo
--+--+-+-++-
 sub1 | postgres | t   | {pub2}  | off|
host=10.2.5.8 dbname=test user=postgres

stest=# select * from pg_subscription;
 subdbid | subname | subowner | subenabled |   subconninfo
 | subslotname | subsynccommit | subpublications
-+-+--++-+-+---+-
   16402 | sub1|   10 | t  | host=10.2.5.8 dbname=test
user=postgres | sub1| off   | {pub2}

stest=# select * from pg_stat_subscription;
 subid | subname | pid  | relid | received_lsn | last_msg_send_time |
 last_msg_receipt_time | latest_end_lsn |latest_end_time
---+-+--+---+--++---++---
 16413 | sub1| 2810 |   | 0/145C3400   ||
2019-03-19 16:23:23.650073+00 | 0/145C3320 | 2019-03-19
16:23:23.446651+00

Issuing a refresh
stest=# alter subscription sub1 refresh publication with (copy_data =
false);
ALTER SUBSCRIPTION

resumes updates , but with a gap in data. Everything up-to to the refresh
statement is missing in the target.

What we're doing wrong ?

Thanks,
Radoslav


Re: subscription broken after upgrade to pg11

2019-03-20 Thread Radoslav Nedyalkov
On Tue, Mar 19, 2019 at 10:37 PM Adrian Klaver 
wrote:

> On 3/19/19 9:35 AM, Radoslav Nedyalkov wrote:
> > Hi All,
> > We're testing upgrade from postgres 10.6 to postgres 11.2 with
> pg_upgrade.
> > Before stopping pg10 we disabled subscription.
> > After upgrade and launching pg11,  the existing logical replication
> > subscription is there and disabled.
> >
> > stest=# \dRs+
> >   List of subscriptions
> >   Name |  Owner   | Enabled | Publication | Synchronous commit |
> >  Conninfo
> >
> --+--+-+-++-
> >   sub1 | postgres | f   | {pub2}  | off|
> > host=10.2.5.8 dbname=test user=postgres
> > (1 row)
> >
> > However after enabling it,  the target table does not get updates,
> > although the subscription looks okay according to the status below.
> >
> > stest=# alter subscription sub1 enable;
> > ALTER SUBSCRIPTION
> > # no updates here
> >
> > stest=# \dRs+
> >  List of subscriptions
> >   Name |  Owner   | Enabled | Publication | Synchronous commit |
> >  Conninfo
> >
> --+--+-+-++-
> >   sub1 | postgres | t   | {pub2}  | off|
> > host=10.2.5.8 dbname=test user=postgres
> >
> > stest=# select * from pg_subscription;
> >   subdbid | subname | subowner | subenabled |   subconninfo
> >   | subslotname | subsynccommit | subpublications
> >
> -+-+--++-+-+---+-
> > 16402 | sub1|   10 | t  | host=10.2.5.8 dbname=test
> > user=postgres | sub1| off   | {pub2}
> >
> > stest=# select * from pg_stat_subscription;
> >   subid | subname | pid  | relid | received_lsn | last_msg_send_time |
> > last_msg_receipt_time | latest_end_lsn |latest_end_time
> >
> ---+-+--+---+--++---++---
> >   16413 | sub1| 2810 |   | 0/145C3400   ||
> > 2019-03-19 16:23:23.650073+00 | 0/145C3320 | 2019-03-19
> > 16:23:23.446651+00
> >
> > Issuing a refresh
> > stest=# alter subscription sub1 refresh publication with (copy_data =
> > false);
> > ALTER SUBSCRIPTION
> >
> > resumes updates , but with a gap in data. Everything up-to to the
> > refresh statement is missing in the target.
> >
> > What we're doing wrong ?
>
> https://www.postgresql.org/docs/11/sql-altersubscription.html
>
> REFRESH PUBLICATION
>
>  Fetch missing table information from publisher. This will start
> replication of tables that were added to the subscribed-to publications
> since the last invocation of REFRESH PUBLICATION or since CREATE
> SUBSCRIPTION.
>
>  refresh_option specifies additional options for the refresh
> operation. The supported options are:
>
>  copy_data (boolean)
>
>  Specifies whether the existing data in the publications that
> are being subscribed to should be copied once the replication starts.
> The default is true.
>
>
> Try with:
>
> copy_data=true
>
>
>I don't think focus is on REFRESH here. It is existing subscription
that should resume and apply changes without discrepancies in the flow.
   First I tried  simply to re-create subscription by retaining replication
slot on the source. However the slot sent already applied changes.
  Obviously exact LSN was lost somehow during upgrade. (how? should it ?)
  So the solution:

On the target before upgrade - disable subscription and get remote_lsn.

stest=# alter subscription sub1 disable;
ALTER SUBSCRIPTION
stest=# select * from pg_replication_origin_status ;
 local_id | external_id | remote_lsn | local_lsn
--+-++---
1 | pg_16473| *0/146E41E0* | 0/2ABDC48
(1 row)

 Upgrade here.

On the target after upgrade execute:

stest=# alter subscription sub1 set (slot_name = NONE);
ALTER SUBSCRIPTION
stest=# drop subscription sub1 ;
DROP SUBSCRIPTION
stest=# create subscription sub1 CONNECTION 'host=10.2.5.8 dbname=test
user=postgres' PUBLICATION pub2 with (slot_name = sub1, create_slot=false,
enabled=false, copy_data=false);
CREATE SUBSCRIPTION
stest=# select oid,* from pg_subscription;
  oid  | subdbid | subname | subowner | subenabled |

how to limit statement memory allocation

2021-03-09 Thread Radoslav Nedyalkov
Hi all,

Occasionally  we get bad queries on our db that consume a lot of memory.
These typically are full joins by mistake or just too large result sets.
My understanding is these should go to a temp file but apparently memory
allocation is preferred.
Last time a statement hit 150GB RAM and did not get a cancel or terminate
signal, so we had to
restart the db after a few hours, after another one popped up.

The only relevant , i guess,  settings on the server are:
huge_pages = 'on'
shared_buffers = 196GB
work_mem = 1GB
maintenance_work_mem = 4GB
effective_cache_size = 720GB

Thank you in advance,
Rado


Re: how to limit statement memory allocation

2021-03-09 Thread Radoslav Nedyalkov
On Tue, Mar 9, 2021 at 6:03 PM Tom Lane  wrote:

> Radoslav Nedyalkov  writes:
> > Occasionally  we get bad queries on our db that consume a lot of memory.
> > These typically are full joins by mistake or just too large result sets.
> > My understanding is these should go to a temp file but apparently memory
> > allocation is preferred.
>
> Perhaps the accumulation is happening on the client side?  libpq doesn't
> have any provision for spilling a result set to disk.
>
> If that's it, you could consider revising your application to read results
> row-at-a-time, although that might require a good deal of effort.
>
> https://www.postgresql.org/docs/current/libpq-single-row-mode.html
>
> regards, tom lane
>

Ah, I named it result set wrongly perhaps.
These are queries , part of a larger ETL function or statement which at the
end just write to a table.
The huge join is an intermediate step.

So I'm still wondering.

Thanks and regards,
Rado


Re: how to limit statement memory allocation

2021-03-09 Thread Radoslav Nedyalkov
On Tue, Mar 9, 2021 at 6:53 PM Tom Lane  wrote:

> Radoslav Nedyalkov  writes:
> > On Tue, Mar 9, 2021 at 6:03 PM Tom Lane  wrote:
> >> Perhaps the accumulation is happening on the client side?  libpq doesn't
> >> have any provision for spilling a result set to disk.
>
> > Ah, I named it result set wrongly perhaps.
> > These are queries , part of a larger ETL function or statement which at
> the
> > end just write to a table.
> > The huge join is an intermediate step.
>
> Hm.  What's the query plan look like?
>
> The only thing I'm aware of that could consume unbounded memory
> on the server side is hash aggregation.  (v13 has improved that
> situation, but I'm guessing you are running some older version.)
> The planner won't choose hash aggregation if it estimates that
> the hash table would need to exceed work_mem, but its estimate
> could be wrong.  If that's the scenario, you could back off
> work_mem until hash aggregation isn't used, or you could try to
> improve the planner's estimates.  If your ETL process involves
> intermediate tables, you might need to explicitly ANALYZE those
> after you fill them so that the planner has a better idea how
> to plan the later steps.
>
> regards, tom lanea


Thanks Tom,
Query and plan attached.

Rado

db=# EXPLAIN 
db-# CREATE TEMP TABLE gross_set_merchants AS (
db(# WITH gs_merchants as (
db(# SELECT
db(# u.merchant_id,
db(# uel.user_id
db(# FROM public.user_event_logs as uel
db(# JOIN users as u ON uel.user_id = u.id
db(# where 
db(# --u.merchant_id = 1030616
db(# uel.created_at >= '2018-01-01'
db(# AND uel.category = 'settings'
db(# AND uel.description IN ('Gross settlement 
enabled','Gross settlement disabled')
db(# GROUP BY 1,2
db(# ), setting_change as (
db(# SELECT
db(# gsm.merchant_id,
db(# cer.date,
db(# SUM(CASE 
db(# WHEN uel.description = 'Gross 
settlement enabled' THEN 1
db(# WHEN uel.description = 'Gross 
settlement disabled' THEN -1
db(# ELSE 0
db(# END) change_setting
db(# FROM external.currency_exchange_rates as cer
db(# CROSS JOIN gs_merchants as gsm --full join, having 
all days for all gs merchants
db(# LEFT JOIN public.user_event_logs as uel ON 
uel.created_at::date = cer.date 
db(# AND gsm.user_id = uel.user_id 
db(# AND uel.created_at >= '2018-01-01'
db(# AND uel.category = 'settings'
db(# AND uel.description IN ('Gross settlement 
enabled','Gross settlement disabled')
db(# WHERE cer.currency = 'GBP'
db(# AND cer.date >= '2018-01-01'
db(# GROUP BY 1,2
db(# 
db(# )
db(# 
db(# SELECT
db(# sc.merchant_id,
db(# sc.date,
db(# SUM(change_setting) OVER (PARTITION BY 
sc.merchant_id ORDER BY date) as gs_on
db(# FROM setting_change as sc); 

 QUERY PLAN 

 
-
 WindowAgg  (cost=183885316.97..183890372.97 rows=252800 width=40)
   CTE gs_merchants
 ->  Group  (cost=2475689.43..2534384.63 rows=567218 width=8)
   Group Key: u.merchant_id, uel.user_id
   ->  Gather Merge  (cost=2475689.43..2532021.22 rows=472682 width=8)
 Workers Planned: 2
 ->  Group  (cost=2474689.41..2476461.97 rows=236341 width=8)
   Group 

Re: Logical Replication, CPU load and Locking contention

2021-03-10 Thread Radoslav Nedyalkov
On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández 
wrote:

> Hello,
>
> I’m troubleshooting a problem at my company with a pg 12 cluster that we
> run.
>
> We are using Amazon DMS to replicate data from our database into S3
> buckets.  DMS replicates data by using logical replication slots.
>
> After introducing DMS in our environment, we have seen an increase in CPU
> load of 20 points at business hours (from ~60% to ~80%).
>
> The other thing that we have identified is that AccessShareLocks increase
> considerably when DMS running.
>
> Based on this information, I’m trying to understand if this is something
> expected when running logical replication or not. We’ve been running
> physical replication for several years and we haven’t seen nothing like
> this. It could be the case that the issue is not related at all with
> logical replication and is purely a DMS artifact.
>
> Thanks before hand!
>
> Best,
> Martín
>
>
Hi,

I would check in pg_stat_activity what those logical replication slots do.
I guess COPY.
Are you doing one shot copy ? every day ? Then copying all the tables will
lead to load increase.
How many tables at  a time DMS copies? It should be configurable.
AccessShareLock is absolutely normal. You have a transaction doing SELECT
(COPY) over a table.

Physical replication is a different thing. It happens at another level.

Regards
Rado


Re: how to limit statement memory allocation

2021-03-10 Thread Radoslav Nedyalkov
On Tue, Mar 9, 2021 at 8:13 PM Tom Lane  wrote:

> Radoslav Nedyalkov  writes:
> > On Tue, Mar 9, 2021 at 6:53 PM Tom Lane  wrote:
> >> The only thing I'm aware of that could consume unbounded memory
> >> on the server side is hash aggregation.  (v13 has improved that
> >> situation, but I'm guessing you are running some older version.)
> >> The planner won't choose hash aggregation if it estimates that
> >> the hash table would need to exceed work_mem, but its estimate
> >> could be wrong.
>
> > Query and plan attached.
>
> Yeah, so, this looks suspicious:
>
>  ->  HashAggregate  (cost=181320662.52..181323190.52 rows=252800
> width=16)
>Group Key: gsm.merchant_id, cer.date
>->  Merge Left Join  (cost=161250580.17..170174828.82
> rows=89196 width=71)
>
> How accurate is that estimate for the number of merchant_id/date
> groups?  Is the estimate for the size of the input join reasonable?
>
> Assuming this is the problem, a brute-force fix could be to turn off
> enable_hashagg.  But it'd be better to try to get the planner's
> estimates more in line with reality.
>
> regards, tom lane
>

Merely taking the number of rows from cross join inputs, N*M gives 1.4B
rows + some multiplication from left join,
that might by the "surprising" part, rows seem to be underestimated. Team
already went to rewrite the query.
Thanks for pointing out HashAggregate as a potential danger.

Thanks and regards,
Rado


standby fails with out-of-order XID insertion

2021-03-31 Thread Radoslav Nedyalkov
Hi all,
So we have master(pg11.8) and standbyA(pg11.11), 24T data, quite busy
data-warehouse on Amazon Linux.
We've built a standbyB(pg11.11) by creating basebackup  from standbyA and
collecting wals from the master.
When  basebackup completed we started recovery from collected wals.
When standbyB caught-up we tried to switch to streaming.
1. created slot on the master
2. activated slot (with short pg_receivewal invocation)
3. did a checkpoint on the master (not sure why)
4. cancelled pg_receivewal.
5. stopped the standbyB and started it. Startup fails with
FATAL:  out-of-order XID insertion in KnownAssignedXids
More log attached.

after switching back to log-shipping same error occurs.

Any clues why/what is the error ?

Thanks and regards,
Radoslav
2021-03-31 14:22:36 UTC [41623] 0: [8-1] user=,db= host=,app=[] LOG:  received 
fast shutdown request
2021-03-31 14:22:36 UTC [41623] 0: [9-1] user=,db= host=,app=[] LOG:  aborting 
any active transactions
2021-03-31 14:22:37 UTC [41642] 0: [1483-1] user=,db= host=,app=[] LOG:  
restartpoint complete: wrote 753696 buffers (2.9%); 0 WAL file(s) added, 0 
removed, 345 recycled; write=122.672 s, sync=0.063 s, total=122.871 s; sync 
files=2588, longest=0.001 s, average=0.001 s; distance=6276805 kB, 
estimate=7385309 kB
2021-03-31 14:22:37 UTC [41642] 0: [1484-1] user=,db= host=,app=[] LOG:  
recovery restart point at A8EA1/298E6BB8
2021-03-31 14:22:37 UTC [41642] 0: [1485-1] user=,db= host=,app=[] DETAIL:  
Last completed transaction was at log time 2021-03-31 14:22:35.091945+00.
2021-03-31 14:22:38 UTC [41642] 0: [1486-1] user=,db= host=,app=[] LOG:  
shutting down
2021-03-31 14:22:38 UTC [41642] 0: [1487-1] user=,db= host=,app=[] LOG:  
restartpoint starting: shutdown immediate
2021-03-31 14:22:41 UTC [41642] 0: [1488-1] user=,db= host=,app=[] LOG:  
restartpoint complete: wrote 304044 buffers (1.2%); 0 WAL file(s) added, 0 
removed, 0 recycled; write=3.318 s, sync=0.027 s, total=3.348 s; sync 
files=2154, longest=0.001 s, average=0.001 s; distance=4314192 kB, 
estimate=7078198 kB
2021-03-31 14:22:41 UTC [41642] 0: [1489-1] user=,db= host=,app=[] LOG:  
recovery restart point at A8EA2/30DFAEA8
2021-03-31 14:22:41 UTC [41642] 0: [1490-1] user=,db= host=,app=[] DETAIL:  
Last completed transaction was at log time 2021-03-31 14:22:35.091945+00.
2021-03-31 14:22:41 UTC [41623] 0: [10-1] user=,db= host=,app=[] LOG:  database 
system is shut down
2021-03-31 14:23:03 UTC [84999] 0: [1-1] user=,db= host=,app=[] LOG:  database 
system was shut down in recovery at 2021-03-31 14:22:41 UTC
2021-03-31 14:23:03 UTC [84999] 0: [2-1] user=,db= host=,app=[] LOG:  entering 
standby mode
2021-03-31 14:23:03 UTC [84999] 0: [3-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 1 to 526/409B7D28
2021-03-31 14:23:03 UTC [84999] 0: [4-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 2 to 8B75/A97FC4E0
2021-03-31 14:23:03 UTC [84999] 0: [5-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 19 to 580/B02F81A8
2021-03-31 14:23:03 UTC [84999] 0: [6-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 6 to 580/B03B13C0
2021-03-31 14:23:03 UTC [84999] 0: [7-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 5 to 228D/B578FBA8
2021-03-31 14:23:03 UTC [84999] 0: [8-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 3 to 5F4/6B0D7560
2021-03-31 14:23:03 UTC [84999] 0: [9-1] user=,db= host=,app=[] LOG:  recovered 
replication state of node 7 to 65F/9971C210
2021-03-31 14:23:03 UTC [84999] 0: [10-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 9 to 65F/B19E5C38
2021-03-31 14:23:03 UTC [84999] 0: [11-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 10 to 3D8/2E753F28
2021-03-31 14:23:03 UTC [84999] 0: [12-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 12 to 526/40791D18
2021-03-31 14:23:03 UTC [84999] 0: [13-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 13 to 521/E87DE3B8
2021-03-31 14:23:03 UTC [84999] 0: [14-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 4 to 526/409B7EE8
2021-03-31 14:23:03 UTC [84999] 0: [15-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 15 to 65F/B1817750
2021-03-31 14:23:03 UTC [84999] 0: [16-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 14 to 200/20004088
2021-03-31 14:23:03 UTC [84999] 0: [17-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 17 to 63F/5CF62898
2021-03-31 14:23:03 UTC [84999] 0: [18-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 18 to 96D/9D63BD30
2021-03-31 14:23:03 UTC [84999] 0: [19-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 16 to DA/43CB4728
2021-03-31 14:23:03 UTC [84999] 0: [20-1] user=,db= host=,app=[] LOG:  
recovered replication state of node 20 to 580/B03AFB58
2021-03-31 14:23:03 UTC [84999] 0: [21-1] user=,db= host=,app=[] L

logical replication initial copy and WAL generation

2021-05-19 Thread Radoslav Nedyalkov
Hello Forum,
I know it sounds weird but we have faced it a couple of times already and
have no idea.
We're trying to establish logical replication from RDS(10.9) to RDS(12.5).
We're adding tables to the publication one by one.
During the initial copy for some tables we experience huge WAL generation
on the source db.
>From 0.5mb/s it jumps to 15MB/s.
This happens when a copy just started or in the middle or at the end.
WAL generation causes service statements to wait on WALWriteLock causing
high response time.

Any idea how the initial copy can cause WAL generation ?!

Thank you very much,

Rado


Re: logical replication initial copy and WAL generation

2021-05-19 Thread Radoslav Nedyalkov
We have wal_log_hints = off.
Then I thought hint bits are never WAL logged.
Or am I missing the part - they are, upon first read ?

Thanks again
Rado


On Wed, May 19, 2021 at 3:56 PM Laurenz Albe 
wrote:

> On Wed, 2021-05-19 at 14:16 +0300, Radoslav Nedyalkov wrote:
> > Hello Forum,
> > I know it sounds weird but we have faced it a couple of times already
> and have no idea.
> > We're trying to establish logical replication from RDS(10.9) to
> RDS(12.5).
> > We're adding tables to the publication one by one.
> > During the initial copy for some tables we experience huge WAL
> generation on the source db.
> > From 0.5mb/s it jumps to 15MB/s.
> > This happens when a copy just started or in the middle or at the end.
> > WAL generation causes service statements to wait on WALWriteLock causing
> high response time.
> >
> > Any idea how the initial copy can cause WAL generation ?!
>
> I don't know how much RDS is like PostgreSQL, but on PostgreSQL this could
> be
> the setting of hint bits for new rows that have never been read before.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>


pg_class.oid at 4B

2021-10-20 Thread Radoslav Nedyalkov
Hello all,
We have a busy database that has oid increasing with 50-100M  per month.
There is no oid type across user attributes anywhere nor user tables with
oids;
Not sure where this generation comes from. If you can hint me, it would be
great.
Also what happens when pg_class.oid reaches 4B.
pg_class itself is ~40K.

Thank you very much,
Rado


table not found on publisher

2022-02-11 Thread Radoslav Nedyalkov
Hello All,
It is a bit specific logical replication setup where we try to replicate
a partitioned table (pg14.1) to a non-partitioned one (pg11.14)

After establishing everything the subscriber fails on the initial copy with
ERROR:  table "public.tab01" not found on publisher

If the subscription is created with (copy_data=false) changes are
propagated okay.

So I'm puzzled. Any comments ?

Thanks a lot
Rado

Here is the example:
SOURCE:
test[14.1]=# CREATE TABLE tab01 (id int PRIMARY KEY) PARTITION BY RANGE(id);
CREATE TABLE
test[14.1]=# CREATE TABLE tab01_10 PARTITION OF tab01 FOR VALUES FROM (0)
TO (10);
CREATE TABLE
test[14.1]=# CREATE TABLE tab01_20 PARTITION OF tab01 FOR VALUES FROM (10)
TO (20);
CREATE TABLE
test[14.1]=# insert into tab01 values (generate_series(1,15));
INSERT 0 15
test[14.1]=# CREATE PUBLICATION pub01 FOR TABLE public.tab01 WITH
(publish_via_partition_root = true);
CREATE PUBLICATION
test[14.1]=#

TARGET:
test[11.14]=# CREATE TABLE tab01 (id int PRIMARY KEY);
CREATE TABLE
test[11.14]=# create subscription sub01
connection 'host=/var/run/postgresql port=5435 dbname=test user=postgres'
publication pub01 ;
NOTICE:  created replication slot "sub01" on publisher
CREATE SUBSCRIPTION
test[11.14]=#


Re: table not found on publisher

2022-02-14 Thread Radoslav Nedyalkov
Just an update.
If the target is upgraded to pg13 then the initial copy goes fine.
Unfortunately it is a 40T db in a sunset, so we'd rather won't upgrade.

On Fri, Feb 11, 2022 at 4:50 PM Radoslav Nedyalkov 
wrote:

> Hello All,
> It is a bit specific logical replication setup where we try to replicate
> a partitioned table (pg14.1) to a non-partitioned one (pg11.14)
>
> After establishing everything the subscriber fails on the initial copy with
> ERROR:  table "public.tab01" not found on publisher
>
> If the subscription is created with (copy_data=false) changes are
> propagated okay.
>
> So I'm puzzled. Any comments ?
>
> Thanks a lot
> Rado
>
> Here is the example:
> SOURCE:
> test[14.1]=# CREATE TABLE tab01 (id int PRIMARY KEY) PARTITION BY
> RANGE(id);
> CREATE TABLE
> test[14.1]=# CREATE TABLE tab01_10 PARTITION OF tab01 FOR VALUES FROM (0)
> TO (10);
> CREATE TABLE
> test[14.1]=# CREATE TABLE tab01_20 PARTITION OF tab01 FOR VALUES FROM (10)
> TO (20);
> CREATE TABLE
> test[14.1]=# insert into tab01 values (generate_series(1,15));
> INSERT 0 15
> test[14.1]=# CREATE PUBLICATION pub01 FOR TABLE public.tab01 WITH
> (publish_via_partition_root = true);
> CREATE PUBLICATION
> test[14.1]=#
>
> TARGET:
> test[11.14]=# CREATE TABLE tab01 (id int PRIMARY KEY);
> CREATE TABLE
> test[11.14]=# create subscription sub01
> connection 'host=/var/run/postgresql port=5435 dbname=test user=postgres'
> publication pub01 ;
> NOTICE:  created replication slot "sub01" on publisher
> CREATE SUBSCRIPTION
> test[11.14]=#
>
>
>