Re: ALTER SUBSCRIPTION ... REFRESH cannot be executed from a function

2024-04-04 Thread Roman Šindelář
Hello,
I am attaching the solution we used.
Thank you for your answers and help,
Roman



--
-- ALTER SUBSCRIPTION REFRESH TEST CASE / SOLUTION
--

-- 9. install extension dblink + create function/procedure [DESTINATION
DATABASE]
\c db2 postgres
create extension dblink;

create function test.dblink_record_execute(TEXT, TEXT)
RETURNS SETOF record LANGUAGE c
PARALLEL RESTRICTED STRICT
AS '$libdir/dblink', $$dblink_record$$
;

create procedure test.dblink_refresh_subscription(sSubName VARCHAR,
user_pwd text)
SECURITY DEFINER AS
$$
DECLARE
BEGIN
  perform test.dblink_record_execute(
pg_catalog.format('user=%L dbname=%L port=%L password=%L',
current_user, pg_catalog.current_database(), (SELECT setting FROM
pg_catalog.pg_settings WHERE name = 'port'), user_pwd),
pg_catalog.format('ALTER SUBSCRIPTION %I REFRESH PUBLICATION',
sSubName)
);
  raise notice 'Subscription % refreshed', sSubName;
END $$ LANGUAGE 'plpgsql';

grant execute on function test.dblink_record_execute(text,text) to
usr_db_deploy;
grant  execute on procedure test.dblink_refresh_subscription(varchar,text)
to usr_db_deploy;

-- 10. disable subscription [DESTINATION DATABASE]
\c db2 usr_db_deploy
call test.disable_subscription('test_sub');

-- 11. change: create new table [SOURCE DATABASE]
\c db1 usr_db_deploy
create table test.tab3 (id int primary key, num int);
grant select on table test.tab3 to usr_db_repl;
insert into test.tab3 values (3, 30);
select * from test.tab3;

-- 12. add table into publication [SOURCE DATABASE]
\c db1 usr_db_deploy
alter publication test_pub add table test.tab3;
select * from pg_publication_tables;

-- 13. create new table on destination side [DESTINATION DATABASE]
\c db2 usr_db_deploy
create table test.tab3 (id int primary key, num int);

-- 14. enable subscription [DESTINATION DATABASE]
\c db2 usr_db_deploy
call test.enable_subscription('test_sub');

-- 15. check new table [DESTINATION DATABASE]
-- Note: no rows, we need REFRESH PUBLICATION
\c db2 usr_db_deploy
select * from test.tab3;

-- 16. refresh subscription [DESTINATION DATABASE]
-- Note: run as non-superuser (usr_db_deploy)
\c db2 usr_db_deploy
call test.dblink_refresh_subscription('test_sub','');




Re: Failure of postgres_fdw because of TimeZone setting

2024-04-04 Thread Adrian Klaver

On 4/3/24 22:23, Adnan Dautovic wrote:

Hi everyone,


I have some trouble using postgres_fdw in order to display some data from a 
Postgres database I do not control in a Postgres database that I do control. I 
filled out the form from the wiki below and would appreciate any tips.


* A description of what you are trying to achieve and what results you expect.:
I am trying to import the public schema of a Postgres instance I do not control (I will call it 
"remote"), but have read-only access to, into a Postgres instance I fully control (I will 
call it "local"), using the foreign data wrapper postgres_fdw.


Define 'read-only', especially as it applies to the privileges on the 
public schema.


Per Tom Lane's comments on timezone, log into the remote server and do:

SHOW timezone;

SET timezone = 'etc/UTC';

SET timezone = 'UTC';



Does anyone have an idea for me?

Kind regards,


Adnan Dautovic





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Moving delta data faster

2024-04-04 Thread Adrian Klaver

On 4/3/24 22:24, yudhi s wrote:


On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver > wrote:


On 4/3/24 20:54, yudhi s wrote:
 > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver
mailto:adrian.kla...@aklaver.com>
 > >> wrote:
 >
 >    > Thank you Adrian.
 >
 > And one thing i forgot to mention this target postgresql database
would
 > be on AWS RDS whereas the source Oracle databases is on premise.
I think
 > we don't have the FDW extension currently in place but we can get
that.
 > I am just not able to understand clearly  though, but do you mean
export
 > the data from source using CSV and do truncate and import on
target. And
 > as these data will be moved through the network won't that cause
slowness?
 >
 > The source database here is Oracle database. Correct me if wrong, it
 > looks like foreign data wrapper is like a DB link. Or do you mean
 > writing a query on the target database (which can be UPSERT or
MERGE)
 > but will be joining the table from the source database through the
 > DBlink/DDW? But my question was whether we should use UPSERT or
MERGE
 > for comparing and loading the delta records to the target postgresql
 > database. Want to understand which is more performant , as I see
in the
 > past Merge having performance issues in the past, but not very sure
 > about that.

My motivation was to get some basic information about your setup and
what you are trying to achieve.

If I understand correctly you have:

1) An Oracle database with tables that you want to copy the complete
data from to a Postgres database. For this sort of thing
COPY(https://www.postgresql.org/docs/current/sql-copy.html
) on the
Postgres end using CSV data generated from the source is probably the
quickest bulk load method.

2) After the initial load you want to do follow up INSERT/UPDATEs based
on a delta of the source tables relative to the initial load. This is
still a bit of mystery to me. How are determining the delta: a) On the
source end entirely or b) Target relative to source? Also what is the
anticipated size of the delta per transfer?

Additional information needed:

1) Network distance between source and target?

2) Network capacity?

3) Expected load on both source and target servers from other
operations?


Thank you. Actually I was trying to understand how to cater the delta 
load after the one time load is done . The delta change in records is 
planned to be found based on the primary keys on the tables. If it found 
the key it will update the records if it does not find the keys it will 
insert the rows.


Basically the select query from the source database will fetch the data 
with a certain time interval(based on the latest update timestamp or 
create timestamp if they are available or else full dump) and put it on 
S3 and then from the S3 it will be picked and gets merged to the target 
postgres database. As upsert and merge both were looking similar , so 
was wondering what we should use here for loading the delta records?


S3 is not a database. You will need to be more specific about '... then 
from the S3 it will be picked and gets merged to the target postgres 
database.'


--
Adrian Klaver
adrian.kla...@aklaver.com





What permissions are required for e.g. EXPLAIN UPDATE ...

2024-04-04 Thread Thomas Nyberg

Hello,

Or put another way, does an EXPLAIN UPDATE on a table really require 
UPDATE permissions to run? Why can't it be done without UPDATE 
permissions? I can understand EXPLAIN ANALYZE UPDATE requiring it, since 
it actually executes the statement, but why would a regular EXPLAIN 
UPDATE require it? Is this documented somewhere in standard postgres 
docs? The only answer I've been able to find on the internet is "The 
user needs all privileges needed for the query that should be explained" 
found here: https://stackoverflow.com/q/72984543


What does an EXPLAIN do exactly to require the extra permissions? I've 
been searching the postgres documentation and the internet, but haven't 
managed to find an answer.


Thanks for any help!

Cheers,
Thomas




Multiple COPY statements for one table vs one for ~half a billion records

2024-04-04 Thread Carl L
Hi there,

I have around half a billion records that are being generated from a back
end that are split into 80 threads (one per core) and I'm performing a copy
from memory ( from stdin binary) into Postgres from each of these threads -
i.e. there are 80 COPY statements being generated for one table that are
running concurrently. I can see each of the Postgres processes sitting at
around 15% CPU usage.

These are all also in the same transaction - I am the only one connected,
so it's not an issue to hold a big transaction.

I can see that many of the Postgres threads have a wait event "LWLock:
BufferContent", which I assume means that they are waiting for each other
before they can write to the table. Therefore, would it be more efficient
to combine all of these and put them into one COPY statement?

Thanks!


Re: Multiple COPY statements for one table vs one for ~half a billion records

2024-04-04 Thread Ron Johnson
On Thu, Apr 4, 2024 at 2:04 PM Carl L  wrote:

> Hi there,
>
> I have around half a billion records that are being generated from a back
> end that are split into 80 threads (one per core) and I'm performing a copy
> from memory ( from stdin binary) into Postgres from each of these threads -
> i.e. there are 80 COPY statements being generated for one table that are
> running concurrently. I can see each of the Postgres processes sitting at
> around 15% CPU usage.
>

Is the target table partitioned in the same way that the input data is
split?

That would make things faster...


> These are all also in the same transaction - I am the only one connected,
> so it's not an issue to hold a big transaction.
>

Unless it fills up your WAL partition.

>


Re: What permissions are required for e.g. EXPLAIN UPDATE ...

2024-04-04 Thread Tom Lane
Thomas Nyberg  writes:
> Or put another way, does an EXPLAIN UPDATE on a table really require 
> UPDATE permissions to run? Why can't it be done without UPDATE 
> permissions?

IIRC, the reasoning is this: should you be allowed to run an EXPLAIN
on a table that you have no permissions for at all?  We've felt that
the answer to that has to be "no".  An example of why not is that
EXPLAIN must take at least a shared lock on the table, which should
not be allowed to someone without any permissions.

Having decided that, the next question is what permissions are enough,
and we've concluded that "the same as it'd take to actually run the
query" is a perfectly appropriate answer.  That in turn lets us
decide that "what strength of table lock should be taken?" can be
answered the same for EXPLAIN as for the underlying query.  This
simplifies life by not requiring there to be different code paths
for EXPLAIN and normal query running in various places.

regards, tom lane




Re: Moving delta data faster

2024-04-04 Thread yudhi s
On Thu, Apr 4, 2024 at 9:04 PM Adrian Klaver 
wrote:

> On 4/3/24 22:24, yudhi s wrote:
> >
> > On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver  > > wrote:
>
> S3 is not a database. You will need to be more specific about '... then
> from the S3 it will be picked and gets merged to the target postgres
> database.'
>
>
The data from S3 will be dumped into the stage table and then the
upsert/merge from that table to the actual table.


Re: Moving delta data faster

2024-04-04 Thread Greg Sabino Mullane
So you want advice from this open source community about moving things from
a proprietary database to another proprietary database, going through a
proprietary storage system along the way? :)

To answer the original question, use MERGE. That's it's job, and your table
size is very tiny, so I doubt the performance impact will be worth worrying
about.

Cheers,
Greg


Re: Moving delta data faster

2024-04-04 Thread Adrian Klaver

On 4/4/24 13:42, yudhi s wrote:


On Thu, Apr 4, 2024 at 9:04 PM Adrian Klaver > wrote:


On 4/3/24 22:24, yudhi s wrote:
 >
 > On Thu, Apr 4, 2024 at 10:16 AM Adrian Klaver
mailto:adrian.kla...@aklaver.com>
 > >> wrote:

S3 is not a database. You will need to be more specific about '... then
from the S3 it will be picked and gets merged to the target postgres
database.'


The data from S3 will be dumped into the stage table and then the 
upsert/merge from that table to the actual table.


The S3 --> staging table would be helped by having the data as CSV and 
then using COPY. The staging --> final table step could be done as 
either ON CONFLICT or MERGE, you would need to test in your situation to 
verify which works better.


--
Adrian Klaver
adrian.kla...@aklaver.com