Re: Logical replication fails when adding multiple replicas

2023-03-22 Thread Will Roper
Thanks for the response Hou,

I've had a look and when the tablesync workers are spinning up there are
some errors of the form:

"2023-03-17 18:37:06.900 UTC [4071] LOG:  logical replication table
synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
started"
"2023-03-17 18:37:06.976 UTC [4071] ERROR:  could not create replication
slot ""pg_37986_sync_37922_7210774007126708177"": ERROR:  replication slot
""pg_37986_sync_37922_7210774007126708177"" already exists"
"2023-03-17 18:37:06.979 UTC [3964] LOG:  background worker ""logical
replication worker"" (PID 4071) exited with exit code 1"

However further down a slot for the same table is successfully created:

"2023-03-17 18:38:41.232 UTC [4098] LOG:  logical replication table
synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onspd"" has
started"

And it is not that table that gets stuck:

"2023-03-17 18:40:08.130 UTC [4052] ERROR:  could not start WAL streaming:
ERROR:  replication slot ""pg_37986_sync_37663_7210774007126708177"" does
not exist"
"2023-03-17 18:40:08.136 UTC [3964] LOG:  background worker ""logical
replication worker"" (PID 4052) exited with exit code 1"
"2023-03-17 18:40:08.884 UTC [4149] LOG:  logical replication table
synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""addressbase_uprntocouncil""
has started"
"2023-03-17 18:40:08.968 UTC [4149] ERROR:  could not start WAL streaming:
ERROR:  replication slot ""pg_37986_sync_37706_7210774007126708177"" does
not exist"
"2023-03-17 18:40:08.971 UTC [3964] LOG:  background worker ""logical
replication worker"" (PID 4149) exited with exit code 1"
"2023-03-17 18:40:08.975 UTC [4150] LOG:  logical replication table
synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""addressbase_address"" has
started"

I'm currently working around the issue by waiting until all tables are in '
STATE_READY' or 'STATE_FINISHED' then dropping the subscription, and
recreating it with 'copy_data=false' -
https://github.com/DemocracyClub/UK-Polling-Stations/blob/aada9def5102f44602dfaca420bb562a8d725870/deploy/files/scripts/setup_db_replication.sh#L46-L48

I've pasted in the complete logs from where I took the examples below, in
case you see something I've missed

Cheers,
Will

PS
The fujitsu postgres blog (https://www.postgresql.fastware.com/blog) is
great - thank you.

LOGS:

1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  starting PostgreSQL
14.7 (Ubuntu 14.7-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 64-bit"
1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  listening on IPv4
address ""127.0.0.1"", port 5432"
1679078208511,"2023-03-17 18:36:48.409 UTC [3964] LOG:  listening on Unix
socket ""/var/run/postgresql/.s.PGSQL.5432"""
1679078208511,2023-03-17 18:36:48.411 UTC [3965] LOG:  database system was
shut down at 2023-03-17 18:36:48 UTC
1679078212980,2023-03-17 18:36:48.416 UTC [3964] LOG:  database system is
ready to accept connections
1679078215787,"2023-03-17 18:36:55.701 UTC [4021] LOG:  logical replication
apply worker for subscription ""polling_stations_0561a02f66363d911"" has
started"
1679078215787,"2023-03-17 18:36:55.751 UTC [4022] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onsud"" has
started"
1679078216038,"2023-03-17 18:36:55.769 UTC [4023] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""auth_user_groups"" has
started"
1679078216038,"2023-03-17 18:36:55.911 UTC [4022] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""uk_geo_utils_onsud"" has
finished"
1679078216038,"2023-03-17 18:36:55.915 UTC [4027] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""file_uploads_upload"" has
started"
1679078216038,"2023-03-17 18:36:55.924 UTC [4023] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""auth_user_groups"" has
finished"
1679078216038,"2023-03-17 18:36:55.936 UTC [4028] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table
""auth_user_user_permissions"" has started"
1679078216289,"2023-03-17 18:36:56.035 UTC [4027] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""file_uploads_upload"" has
finished"
1679078216289,"2023-03-17 18:36:56.045 UTC [4029] LOG:  logical replication
table synchronization worker for subscription
""polling_stations_0561a02f66363d911"", table ""councils_usercouncils"" has
started"
1679078216289,"2023-03-17 18:36:56.057 UTC [4028] LOG:  logical repl

For temporary tables; truncate vs on commit delete all

2023-03-22 Thread Jim Vanns
Does anyone have any idea which is generally better (by better I mean
most efficient/quickest!) in this simple scenario? I have a temporary
table which I wish to retain for the duration of a long-running
session. However, for each transaction it must be purged. So, is it
better to;

a) Always CREATE TEMPORARY TABLE + ON COMMIT DROP for every transaction or;
b) TRUNCATE TABLE or;
c) ON COMMIT DELETE ROWS

Both b & c avoid re-creating the table each time but at the cost of an
explicit purge of some form.
I would assume that despite its name, the latter, example c, is more
akin to a truncation? Or does it actually perform a deletion?

Cheers

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Re: For temporary tables; truncate vs on commit delete all

2023-03-22 Thread Laurenz Albe
On Wed, 2023-03-22 at 11:59 +, Jim Vanns wrote:
> Does anyone have any idea which is generally better (by better I mean
> most efficient/quickest!) in this simple scenario? I have a temporary
> table which I wish to retain for the duration of a long-running
> session. However, for each transaction it must be purged. So, is it
> better to;
> 
> a) Always CREATE TEMPORARY TABLE + ON COMMIT DROP for every transaction or;
> b) TRUNCATE TABLE or;
> c) ON COMMIT DELETE ROWS
> 
> Both b & c avoid re-creating the table each time but at the cost of an
> explicit purge of some form.
> I would assume that despite its name, the latter, example c, is more
> akin to a truncation? Or does it actually perform a deletion?

b) or c) are best.
Yes, ON COMMIT DELETE ROWS will truncate the temporary table.

Creating too many temporary tables in a short time can lead to bloat
in "pg_attribute".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-22 Thread Bryn Llewellyn
> laurenz.a...@cybertec.at wrote:
> 
> ...I understand that you ask questions to gain deeper understanding.
> 
>> b...@yugabyte.com wrote:
>> 
>> ...I had never come across use cases where [scrollability] was beneficial. I 
>> wanted, therefore, to hear about some. I thought that insights here would 
>> help me understand the mechanics.
> 
> I recently used cursor scrollability, so I can show you a use case:
> 
> github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49
> 
> The goal is to get the query result count right away, without having to run a 
> second query for it: you declare the cursor, move to the end of the result 
> set, fetch the ROW_COUNT, then move back to the beginning of the result set 
> and start fetching the result rows.
> 
> ...I personally find that reading the PostgreSQL documentation gets you far, 
> but only so far: for deep understanding, you have to read the code. It is 
> usually well
> documented and readable, and I have come to see it as an extension of the 
> documentation that covers the details.

Thanks for the link to your SQL file at the line where you get the row count in 
the way that you describe. I saw that this is in the PL/pgSQL source text for 
function "materialize_foreign_table()" (~200 lines). And I saw that you use the 
cursor mechanism that we're discussing here in only one other function, 
"db_migrate_refresh()" (~480 lines). But this second one doesn't use the "move 
forward all" device to get a row count. I looked only at these two functions.

I noted that neither of these functions has a refcursor formal argument and 
that, rather, you open (i.e. create) each of the three cursors that you use 
within the two functions that uses them. I noted, too,  that for the three 
"select" statements that you use to open your refcursors, none of these has an 
"order by". I noted that your code implements "create table destination" and 
"insert into destination... select from source..." where order doesn't matter. 

However, source code famously reveals only what it does and not what the 
author's intention, and overall design philosophy, is. I looked at the README 
accounts for these two functions here:

github.com/cybertec-postgresql/db_migrator#materialize_foreign_table
github.com/cybertec-postgresql/db_migrator#db_migrate_refresh

But these accounts say nothing about the design of their implementations. The 
accounts, and the bigger picture, seem to imply that read consistency in the 
presence of concurrent activity from other sessions is not a concern. I'm 
guessing that this is accounted for upstream from how the code that I'm looking 
at operates—i.e. that the source database is extracted to staging tables like, 
say, export does so that your code operates as the only session that reads 
from, creates, and populates the tables that it references.

The upshot, therefore, is that I'm afraid that I can only guess at why you use 
"open, fetch, close" on a refcursor rather than an ordinary cursor for loop. 
After all, you use the technique only to traverse metadata tables about 
partitions, subpartitions, and columns. I'd be astonished if such tables have 
humongous numbers of rows (like hundreds of millions).

As a sanity test, I did this:

create function s.t_count()
  returns int
  set search_path = pg_catalog, pg_temp
  language plpgsql
as $body$
declare
  cur refcursor := 'cur';
  n int not null := 0;
begin
  open cur for select v from s.t;
  move forward all in cur;
  get diagnostics n = row_count;
  move absolute 0 in cur;
  return n;
end;
$body$;

Then I timed these two alternatives on a test table with ten million rows.

select count(*) from s.t;
select s.t_count();

They both got the same answer. But my function took about twice as long as the 
native count(*). Mind you, with "only" 10 million rows (and using a PG cluster 
in a VM on my Mac Book) the times were tiny: ~600ms versus ~300ms.

Then I tried this:

create procedure s.p(tab in text, mode in text)
  set search_path = pg_catalog, pg_temp
  security definer
  language plpgsql
as $body$
declare
  cur   constant refcursor not null := 'cur';
  stmt  constant text  not null := format('select k, v from s.%I', 
tab);
  cnt_stmt  constant text  not null := format('select count(*) from s.%I', 
tab);

  kk   int not null := 0;
  vv   int not null := 0;
  kint;
  vint;
  nint not null := 0;
  cnt  int not null := 0;
begin
  case mode
when 'naive' then
  execute cnt_stmt into cnt;
  for kk, vv in execute stmt loop
n := n + 1;
assert kk = n and vv = n*2;
  end loop;
  assert n = cnt;

when 'refcursor' then
  open cur for execute stmt;
  move forward all in cur;
  get diagnostics cnt = row_count;
  move absolute 0 in cur;

  loop
fetch cur into k, v;
exit when not found;
n := n + 1;
assert k = n and v = n*2;
  end loop;
  close cur; -- Just as a formality.
 

Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-22 Thread Laurenz Albe
On Wed, 2023-03-22 at 12:09 -0700, Bryn Llewellyn wrote:
> > laurenz.a...@cybertec.at wrote:
> > I recently used cursor scrollability, so I can show you a use case:
> > 
> > github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49
>
> However, source code famously reveals only what it does and not what the 
> author's intention,
> and overall design philosophy, is. I looked at the README accounts for these 
> two functions here:
> 
> github.com/cybertec-postgresql/db_migrator#materialize_foreign_table
> github.com/cybertec-postgresql/db_migrator#db_migrate_refresh
> 
> But these accounts say nothing about the design of their implementations.

The documentation describes the purpose of the function and how to use it,
not how I went about implementing it.


You are interpreting too much.  I just wanted to show you a case where 
scrollable cursors
can be useful.

> The upshot, therefore, is that I'm afraid that I can only guess at why you 
> use "open, fetch, close"
> on a refcursor rather than an ordinary cursor for loop. After all, you use 
> the technique only
> to traverse metadata tables about partitions, subpartitions, and columns. I'd 
> be astonished
> if such tables have humongous numbers of rows (like hundreds of millions).

No, I don't expect large result sets, and concurrency won't be a problem.

I explained why I used scrollable cursors: to save myself an extra query for
the total result set count.

Here is another account of how cursors can be useful:
https://www.cybertec-postgresql.com/en/pagination-problem-total-result-count/

Yours,
Laurenz Albe




Re: Is the PL/pgSQL refcursor useful in a modern three-tier app?

2023-03-22 Thread Adrian Klaver

On 3/22/23 12:09, Bryn Llewellyn wrote:

laurenz.a...@cybertec.at  wrote:

...I understand that you ask questions to gain deeper understanding.


b...@yugabyte.com  wrote:

...I had never come across use cases where [scrollability] was 
beneficial. I wanted, therefore, to hear about some. I thought that 
insights here would help me understand the mechanics.


I recently used cursor scrollability, so I can show you a use case:

github.com/cybertec-postgresql/db_migrator/blob/master/db_migrator--1.0.0.sql#L49 




Thanks for the link to your SQL file at the line where you get the row 
count in the way that you describe. I saw that this is in the PL/pgSQL 
source text for function "materialize_foreign_table()" (~200 lines). And 
I saw that you use the cursor mechanism that we're discussing here in 
only one other function, "db_migrate_refresh()" (~480 lines). But this 
second one doesn't use the "move forward all" device to get a row count. 
I looked only at these two functions.


I noted that neither of these functions has a refcursor formal argument 
and that, rather, you open (i.e. create) each of the three cursors that 
you use within the two functions that uses them. I noted, too,  that for 
the three "select" statements that you use to open your refcursors, none 
of these has an "order by". I noted that your code implements "create 
table destination" and "insert into destination... select 
from source..." where order doesn't matter.


CREATE FUNCTION materialize_foreign_table(
   schema name,
   table_name name,
   with_data boolean DEFAULT TRUE,
   pgstage_schema name DEFAULT NAME 'pgsql_stage'
) RETURNS boolean
   LANGUAGE plpgsql VOLATILE STRICT SET search_path = pg_catalog AS
$$DECLARE
   ft name;
   stmt   text;
   errmsg text;
   detail text;
   cur_partitions refcursor;
   cur_subpartitions  refcursor;

...


CREATE FUNCTION db_migrate_refresh(
   plugin name,
   staging_schema nameDEFAULT NAME 'fdw_stage',
   pgstage_schema nameDEFAULT NAME 'pgsql_stage',
   only_schemas   name[]  DEFAULT NULL
) RETURNS integer
   LANGUAGE plpgsql VOLATILE CALLED ON NULL INPUT SET search_path = 
pg_catalog AS

$$DECLARE
   extschema   text;
   old_msgleveltext;
   v_plugin_schema text;
   v_create_metadata_views regproc;
   v_translate_datatyperegproc;
   v_translate_identifier  regproc;
   v_translate_expression  regproc;
   c_col   refcursor

...

What is not formal about the above?

Though it does not matter as, back to the docs again:

https://www.postgresql.org/docs/current/plpgsql-cursors.html

"All access to cursors in PL/pgSQL goes through cursor variables, which 
are always of the special data type refcursor. One way to create a 
cursor variable is just to declare it as a variable of type refcursor. 
Another way is to use the cursor declaration syntax, which in general is:


name [ [ NO ] SCROLL ] CURSOR [ ( arguments ) ] FOR query;
"

Though maybe you are trying to differentiate between bound and unbound 
refcursor's, where the ones defined in the functions are unbound.


Also order by is not relevant for getting a count.


Finally, I see how an understanding of internals helps the understanding 
of performance-related question. But I find it very hard to accept that 
I should read the C implementation of PostgreSQL in order to get the 
proper mental model, and the proper terms of art, that I need to 
understand it semantics (and the corresponding SQL and PL/pgSQL syntax). 
Having said that, I did a little test an saw that this:


move last in cur;
get diagnostics n = row_count;

reported just 1—in contrast to what you used:

move forward all in cur;


Because they are different actions.

The first directly moves to the last row in the cursor and counts just 
that row


The second moves through all the rows left in the cursor and counts all 
the rows it traversed.


It is spelled out here:

https://www.postgresql.org/docs/current/sql-move.html

"The count is the number of rows that a FETCH command with the same 
parameters would have returned (possibly zero)."





I've no idea how it's possible to navigate to the last result in the set 
without knowing how many there are. Maybe that fact is there 
internally—but with no explicit SQL or PL/pgSQL exposure of the value.







--
Adrian Klaver
adrian.kla...@aklaver.com