Re: COPY FROM - to avoid WAL generation

2018-08-21 Thread Jeff Ross

On 8/21/18 9:00 AM, Ravi Krishna wrote:

In a recent thread of mine I learned something very interesting. If a 
table is created and data is loaded via COPY FROM within the same 
transaction, then PG will be smart enough to not generate WAL logs 
because all it needs to do is to track the status of the transaction 
and let the data load go to the new data file created for the table.  
If committed, the table is released for other sessions, if rolledback, 
vaccum will delete the data file later on.


I tested it as follows for a table with 50 milllion rows. No indexes.

Case 1
  - create the table first.
  - in a separate transaction load the 50 million rows.

Took 3 min 22 seconds

Case 2
  - start transaction
  - create table
  - load 50 million rows
  - commit transaction

Took: 3 min 16 seconds.

Am I missing anything?


Have you looked into pg_bulkload?

https://github.com/ossc-db/pg_bulkload

Docs are here:

http://ossc-db.github.io/pg_bulkload/index.html

Jeff


Dubugging an intermittent foreign key insert error with csvlog

2019-09-16 Thread Jeff Ross

Hi all,

I've been debugging an intermittent foreign key insert error on our 
single database / multi-tenant server.  To help isolate traffic by 
tenant, I've switched to using csvlog and for the duration and have set 
log_min_duration_statement to 0 to get *everything*.  Fortunately, daily 
80G csvlogs compress nicely.


For lack of a readable way to paste in the 12 lines of relevant csvlog 
into an e-mail, I've uploaded a very small 3K csv file to my web server at


https://openvistas.net/hansens_error.csv

The bare bones of the issue involve inserting a row into a table named 
load_det, then getting the serial sequence of that table, getting the 
last_value of that sequence and then inserting into another table named 
cargo_det using that retrieved last_value as the foreign key that ties 
this row to the load_det table.  The vast majority of these succeed 
without issue but not all.


The csvlog snip shows what I believe are 2 simultaneous but separate 
sessions and the session that attempts to insert into the cargo_det 
table is not the same session that inserted into the load_det table. 
That's what my hunch is but what is unclear to me is if those separate 
sessions are also in separate transactions.


csvlog has a couple of columns that I'm unclear about: 
session_start_time, virtual_transaction_id.  Is session_start_time the 
time inside a transaction block, as in beginning with a begin but before 
a commit or rollback?  Or is it maybe just how long this pgbouncer 
session has been connected?  virtual_transaction_id is defined in the 
docs as backendID/localXID--do separate backendIDs also represent 
separate transactions?  Is there a better way to determine separate 
transactions within csvlog?


Also, the app code that does this is legacy perl using DBD::Pg but the 
original code was written for Informix.  We've been in the process of 
moving off informix for a while now and should be done within the month. 
 I intend to re-work this to use returning id (available in postgres 
since 8.2!) instead of the serial sequence / last_value hack but not 
quite there yet.


Thanks,

Jeff Ross




Re: Dubugging an intermittent foreign key insert error with csvlog

2019-09-16 Thread Jeff Ross

On 9/16/19 4:07 PM, Adrian Klaver wrote:

On 9/16/19 1:46 PM, Jeff Ross wrote:




The csvlog snip shows what I believe are 2 simultaneous but separate 
sessions and the session that attempts to insert into the cargo_det 
table is not the same session that inserted into the load_det table. 
That's what my hunch is but what is unclear to me is if those separate 
sessions are also in separate transactions.


To me it looks like the INSERT into load_det and into cargo_det are 
occurring in the same transaction(934281062). The part that would 
concern me is that:


select last_value from load_det_id_seq

occurs in different transactions and sessions. From here:

https://www.postgresql.org/docs/11/sql-createsequence.html

"Also, last_value will reflect the latest value reserved by any session, 
whether or not it has yet been returned by nextval."


Especially as the error is coming from a different 
transaction(934281063) and session then the INSERTs. I'm guessing that 
there is cross talk on the sequence number fetch and application to 
cargo_det.




Thank you Adrian--I think my hunch was basically correct then.  Now all 
I need to do is figure out why we have multiple sessions and 
transactions.  Or it might be time to skip ahead and get rid of the 
last_value query.


Jeff




Re: Query which shows FK child columns?

2019-11-14 Thread Jeff Ross

On 11/14/19 11:49 AM, Ron wrote:

v9.6.16

I have a query which shows the parents and children in FK relations, 
along with the parent column name, but can't seem to find the child 
column names.


Is there a way to find the child column names without having to dig 
into pg_constraint?


Thanks



I do not think you can do this without using pg_constraint.

I've been using this function to display those FKs.  The original code 
isn't mine but as I recall I had to tweak it a little.


This is on 10 and I can't remember if this was used on 9.6 but I'd be 
surprised if any of this won't work on 9.6.


client@cargotel_dev> \sf cargotel_common.show_foreign_keys(text)
CREATE OR REPLACE FUNCTION cargotel_common.show_foreign_keys(tablename text)
 RETURNS TABLE(table1 text, column1 text, type text, table2 text, 
column2 text)

 LANGUAGE plpgsql
AS $function$
    declare
    schemaname text;
    begin
    select into schemaname current_schema();
    return query
    execute format('
    select
    conrelid::regclass::text as table1,
    a.attname::text as column1,
    t.typname::text as type,
    confrelid::regclass::text as table2,
    af.attname::text as column2
    from
    pg_attribute af,
    pg_attribute a,
    pg_type t,
    (
    select
    conrelid,
    confrelid,
    conkey[i] as conkey,
    confkey[i] as confkey
    from (
    select
    conrelid,
    confrelid,
    conkey,
    confkey,
    generate_series(1,array_upper(conkey,1)) as i
    from
    pg_constraint
    where contype = ''f''
    )
    ss) ss2
    where
    af.attnum = confkey and
    af.attrelid = confrelid and
    a.attnum = conkey and
    a.attrelid = conrelid and
    a.atttypid = t.oid and
    confrelid::regclass = ''%I.%I''::regclass
 order by 1,2;',schemaname,tablename);
    end;
$function$

I use column headings "table 1, column1, table2, column2" but It's easy 
enough to tweak the column labels.


Example:


client@cargotel_dev> \d+ ref_acct_cache
 Table 
"client.ref_acct_cache"
 Column │  Type   │ Collation │ Nullable │ Default   │ 
Storage  │ Stats target │ Description

┼─┼───┼──┼┼──┼──┼─
 id │ integer │   │ not null │ 
nextval('ref_acct_cache_id_seq'::regclass) │ plain │  │
 descr  │ text    │   │ 
│    │ extended │  │

Indexes:
    "ref_acct_cache_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "acct_cache" CONSTRAINT 
"acct_cache_type_id_ref_acct_cache_id_fk" FOREIGN KEY (type_id) 
REFERENCES ref_acct_cache(id)



client@cargotel_dev> select * from 
cargotel_common.show_foreign_keys('ref_acct_cache');

   table1   │ column1 │ type │ table2 │ column2
┼─┼──┼┼─
 acct_cache │ type_id │ int4 │ ref_acct_cache │ id
(1 row)


client@cargotel_dev> \d+ acct_cache
 Table 
"client.acct_cache"
    Column │   Type   │ Collation │ Nullable 
│    Default │ Storage  │ Stats target │ 
Description

───┼──┼───┼──┼┼──┼──┼─
 id    │ integer  │   │ not null │ 
nextval('acct_cache_id_seq'::regclass) │ plain    │  │
 type_id   │ integer  │   │ 
│    │ plain    │ │
 prefix    │ text │   │ 
│    │ extended │ │
 data  │ text │   │ 
│    │ extended │ │
 amount    │ numeric  │   │ 
│    │ main │ │
 timestamp │ timestamp with time zone │   │ 
│    │ plain    │ │
 check_number  │ text │   │ 
│    │ extended │ │
 client_number │ text │   │ 
│    │ extended │ │
 check_date    │ date │   │ 
│    │ plain    │ │

Indexes:
    "acct_cache_pkey" PRIMARY KEY, btree (id)
    "acct_cache_prefix_type_id_data_idx" btree (prefix, type_id, data)
    "acct_ca

Re: Question: what is proper way to define python function as event_trigger?

2019-12-22 Thread Jeff Ross

On 2019-12-22 15:27, Andrei Pozolotin wrote:


Hello.

Problem:

1. any attempt to define python function as an event_trigger, i.e.:

CREATE FUNCTION public.verify()
RETURNS event_trigger
LANGUAGE 'plpython3u'
AS $$ print("hello-kitty") $$;

2. fails with message:

ERROR: trigger functions can only be called as triggers

SQL state: 0A000

3. here in the source:

https://github.com/postgres/postgres/blob/master/src/pl/plpython/plpy_procedure.c#L226

Question:

what is proper way to define python function as event_trigger?

Thank you.



Just do

"create function public.verify() as trigger..."

https://www.postgresql.org/docs/10/plpython-trigger.html

TD["event"] contains the type of event as a string and I routinely do 
things like


    if TD["event"] == "UPDATE":
        #do update stuff
    elif TD["event'} == "INSERT":
        #do insert related stuff

Jeff





Re: Question: what is proper way to define python function as event_trigger?

2019-12-22 Thread Jeff Ross

On 2019-12-22 16:07, Jeff Ross wrote:

On 2019-12-22 15:27, Andrei Pozolotin wrote:


Hello.

Problem:

1. any attempt to define python function as an event_trigger, i.e.:

CREATE FUNCTION public.verify()
RETURNS event_trigger
LANGUAGE 'plpython3u'
AS $$ print("hello-kitty") $$;

2. fails with message:

ERROR: trigger functions can only be called as triggers

SQL state: 0A000

3. here in the source:

https://github.com/postgres/postgres/blob/master/src/pl/plpython/plpy_procedure.c#L226 



Question:

what is proper way to define python function as event_trigger?

Thank you.



Just do

"create function public.verify() as trigger..."


My bad--that should be "create function public.verify() returns trigger... "

Jeff





Getting more detail in plpython error messages

2022-03-07 Thread Jeff Ross

Hi all,

In psql a database error will print both ERROR: and DETAIL: lines.

postgres@testdb# delete from inspection where bundle_id in (select id 
from test_archive_20170401.load order by id);
ERROR:  update or delete on table "inspection" violates foreign key 
constraint "inspection_weather_inspection_id_inspection_id_fk" on table 
"inspection_weather"
DETAIL:  Key (id)=(158967) is still referenced from table 
"inspection_weather".


With plpython (both u and 3u) all I see printed is the ERROR part.

    try:
    check = plpy.execute("delete from inspection where bundle_id in 
(select id from test_archive_20170401.load order by id)")

    except plpy.SPIError as e:
    plpy.notice("Error!", e)

postgres@testdb# select * from test_delete();
NOTICE:  ('Error!', ForeignKeyViolation('update or delete on table 
"inspection" violates foreign key constraint 
"inspection_weather_inspection_id_inspection_id_fk" on table 
"inspection_weather"',))


Is there a way to get the DETAIL part as well?

Thanks,

Jeff




Re: Getting more detail in plpython error messages

2022-03-07 Thread Jeff Ross

On 3/7/22 11:06 AM, Tom Lane wrote:

Jeff Ross  writes:

Is there a way to get the DETAIL part as well?

It's not very well documented AFAICS, but a SPIError object has a
"detail" attribute, so "e.detail" should help you.  It looks like
you might prefer to print "e.spidata", which seems to contain all
the available fields.

regards, tom lane



Thank you, Tom!  As always, that is exactly what I need.

Jeff




Logically replicated table has no visible rows

2022-05-31 Thread Jeff Ross

Hello,

We have a logically replicated table on RDS that is 39 G in size on both 
the publisher (10.21) and the subscriber (12.8).


The replication slots on the publisher are all marked as active and the 
lsns are current so no lag.


Other tables on the subscriber side are also identical in size and have 
no problem with queries against them.


We did a vacuum full on the subscriber table and the size dropped to a 
couple hundred MBs and is growing but still has no visible rows.


Now on the publisher I see a temporary replication slot so I'm guessing 
that in reality replication had not finished and that's why the 
load_events table was full sized but had no visible rows?


At this point I guess my questions are does my hunch that replication 
hadn't completely finished is the root cause of what we saw make sense?


I've always used both the lsns and the lack of temporary replication 
slots to tell when logical replication had finished--is there a more 
authoritative way to do so?


Thanks,

Jeff




Re: Logically replicated table has no visible rows

2022-06-03 Thread Jeff Ross

On 5/31/22 11:46 AM, Jeff Ross wrote:

Hello,

We have a logically replicated table on RDS that is 39 G in size on 
both the publisher (10.21) and the subscriber (12.8).


The replication slots on the publisher are all marked as active and 
the lsns are current so no lag.


Other tables on the subscriber side are also identical in size and 
have no problem with queries against them.


We did a vacuum full on the subscriber table and the size dropped to a 
couple hundred MBs and is growing but still has no visible rows.


Now on the publisher I see a temporary replication slot so I'm 
guessing that in reality replication had not finished and that's why 
the load_events table was full sized but had no visible rows?


At this point I guess my questions are does my hunch that replication 
hadn't completely finished is the root cause of what we saw make sense?


I've always used both the lsns and the lack of temporary replication 
slots to tell when logical replication had finished--is there a more 
authoritative way to do so?


Thanks,

Jeff




As a follow up for the archives...

It became apparent the longer we looked that logical replication in fact 
had not finished and indeed never did finish.  On both an EC2 server and 
an RDS server we were missing at least one table that by table size 
appeared to be identical to the publisher's table and yet had no visible 
rows.  On the publisher I would occasionally see temporary replication 
slots but they would go away in a few minutes.


I then dropped the subscription on the RDS server and immediately 
restarted it with (copy_data = False).  As expected we immediately saw 
rows being inserted into that massive table but it still showed only 
those rows.  I have a "backfill" process to sync up logically replicated 
tables and that ran but very, very slowly.  Finally while syncing that 
big table it stalled completely.  Network monitoring showed about 
100Kb/s traffic to the RDS server.  That is not a typo.


Our client spun up another RDS server, imported all the settings from 
the previous one and I setup the database and imported the schemas.  As 
soon as we started replication we were seeing network transfers in the 
45Mb/s range.  Replication finished with all tables intact in under 5 
hours for 420G.


We changed nothing on our side so whatever was causing the glacial data 
transfer was on AWS side.


Jeff




15 pg_upgrade with -j

2023-05-22 Thread Jeff Ross

Hello!

We are moving from 10 to 15 and are in testing now.

Our development database is about 1400G and takes 12 minutes to complete 
a pg_upgrade with the -k (hard-links) version.  This is on a CentOS 7 
server with 80 cores.


Adding -j 40 to use half of those cores also finishes in 12 minutes and 
ps / top/ htop never show more than a single process at a time in use.


Bumping that to -j 80 to use them all also finishes in 12 minutes and 
still only a single process.


Running the suggested vacuum analyze after pg_upgrade completes takes 
about 19 minutes.  Adding -j 40 takes that time down to around 5 
minutes, jumps the server load up over 30 and htop shows 40 processes.


If -j 40 helps there--why not with pg_upgrade?

The full commands we are using for pg_upgrade are pretty stock:

time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80


Our production database is closer to 1900G.  If we're looking at a 30 
minute pg_upgrade window we'll be okay but if there is anything we can 
do to knock that time down we will and any suggestions to do so would be 
greatly appreciated.


Jeff Ross

Re: 15 pg_upgrade with -j

2023-05-22 Thread Jeff Ross

On 5/22/23 5:24 PM, Adrian Klaver wrote:

On 5/22/23 16:20, Jeff Ross wrote:

Hello!

We are moving from 10 to 15 and are in testing now.

Our development database is about 1400G and takes 12 minutes to 
complete a pg_upgrade with the -k (hard-links) version. This is on a 
CentOS 7 server with 80 cores.


Adding -j 40 to use half of those cores also finishes in 12 minutes 
and ps / top/ htop never show more than a single process at a time in 
use.


Bumping that to -j 80 to use them all also finishes in 12 minutes and 
still only a single process.


Running the suggested vacuum analyze after pg_upgrade completes takes 
about 19 minutes.  Adding -j 40 takes that time down to around 5 
minutes, jumps the server load up over 30 and htop shows 40 processes.


If -j 40 helps there--why not with pg_upgrade?


From docs:

https://www.postgresql.org/docs/current/pgupgrade.html

The --jobs option allows multiple CPU cores to be used for 
copying/linking of files and to dump and restore database schemas in 
parallel; a good place to start is the maximum of the number of CPU 
cores and tablespaces. This option can dramatically reduce the time to 
upgrade a multi-database server running on a multiprocessor machine.


So is the 1400G mostly in one database in the cluster?



The full commands we are using for pg_upgrade are pretty stock:

time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up 
-k -j 40
time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B 
/usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up 
-k -j 80


Our production database is closer to 1900G.  If we're looking at a 30 
minute pg_upgrade window we'll be okay but if there is anything we 
can do to knock that time down we will and any suggestions to do so 
would be greatly appreciated.


Jeff Ross


Yes, one big database with about 80 schemas and several other smaller 
databases so -j should help, right?


Jeff

Re: 15 pg_upgrade with -j

2023-05-23 Thread Jeff Ross



On 5/22/23 5:42 PM, Tom Lane wrote:

Jeff Ross  writes:

On 5/22/23 5:24 PM, Adrian Klaver wrote:

So is the 1400G mostly in one database in the cluster?

Yes, one big database with about 80 schemas and several other smaller
databases so -j should help, right?

AFAICT from a quick look at the code, you won't get any meaningful
parallelism unless you have several large DBs and/or several large
tablespaces.  It looks like the assumption was that issuing link()
requests in parallel wouldn't help much but just swamp your disk
if they're all on the same filesystem.  Maybe that could use
rethinking, not sure.

regards, tom lane


Thanks Tom.  These are all smokingly fast SSDs so it would be 
interesting to see how well they'd hold up under that load.


Jeff

Re: 15 pg_upgrade with -j

2023-05-23 Thread Jeff Ross

On 5/22/23 5:43 PM, Adrian Klaver wrote:




From docs:

https://www.postgresql.org/docs/current/pgupgrade.html

The --jobs option allows multiple CPU cores to be used for 
copying/linking of files and to dump and restore database schemas in 
parallel; a good place to start is the maximum of the number of CPU 
cores and tablespaces. This option can dramatically reduce the time 
to upgrade a multi-database server running on a multiprocessor machine.


So is the 1400G mostly in one database in the cluster?



The full commands we are using for pg_upgrade are pretty stock:


Yes, one big database with about 80 schemas and several other smaller 
databases so -j should help, right?



As I understand it no. That the parallelism is between databases not 
within a database. Further that 'database schemas' refers to schema as 
the overall database object definitions not the namespaces known as 
schemas in the database.


Thanks Adrian.  That "restore database schemas in parallel" phrase seems 
like it would be really easy to read like we did and expect it to work 
with one database and multiple schemas.


Maybe it should be changed to "restore multiple databases in parallel" 
instead?


Jeff

Re: Question: Multiple pg clusters on one server can be reached with the standard port.

2023-06-16 Thread Jeff Ross



We have already looked at pgbouncer and it works with that but unfortunately 
you have to do the
authentication in pgbouncer. Which we don't like so much.

Regards,
Michael



You can set up pgbouncer to authenticate in postgres:

https://www.2ndquadrant.com/en/blog/understanding-user-management-in-pgbouncer/
See the "auth_query, auth_user" section.

https://www.2ndquadrant.com/en/blog/pg-phriday-securing-pgbouncer/


Jeff

No yum repo for CentOS 7 and postgres 16?

2023-10-05 Thread Jeff Ross

Hi all,

CentOS 7 isn't quite dead yet but it appears that CentOS7 is not 
included in the new yum repo file including 16.


Here's a bit from the latest repo file:

Name    : pgdg-redhat-repo
Version : 42.0
Release : 35PGDG
Architecture: noarch
Install Date: (not installed)
Group   : Unspecified
Size    : 15459
License : PostgreSQL
Signature   : DSA/SHA1, Thu 14 Sep 2023 06:39:48 AM MDT, Key ID 
1f16d2e1442df0f8

Source RPM  : pgdg-redhat-repo-42.0-35PGDG.src.rpm
Build Date  : Thu 14 Sep 2023 06:39:32 AM MDT
Build Host  : koji-rhel-9-x86-64-pgbuild
Vendor  : PostgreSQL Global Development Group
URL : https://yum.postgresql.org
Summary : PostgreSQL PGDG RPMs- Yum Repository Configuration for Red 
Hat / Rocky / CentOS

Description :
This package contains yum configuration for Red Hat Enterprise Linux, 
CentOS,

and also the GPG key for PGDG RPMs.
* Tue Sep 12 2023 Devrim Gündüz  - 42.0-35PGDG
- Add v16 repos
- Remove v16 repos from RHEL 7

Really?  Might one inquire as to why? Yes, CentOS 7 is headed for EOL 
but not until June 30, 2024.


Do those of us still on CentOS 7 wanting to upgrade to 16 now have to 
build from source?


Jeff Ross


Re: No yum repo for CentOS 7 and postgres 16?

2023-10-05 Thread Jeff Ross


On 10/5/23 15:46, David G. Johnston wrote:

On Thursday, October 5, 2023, Jeff Ross  wrote:

Hi all,

CentOS 7 isn't quite dead yet but it appears that CentOS7 is not
included in the new yum repo file including 16.

Do those of us still on CentOS 7 wanting to upgrade to 16 now have
to build from source?


https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/ 
<https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/>


David J.

Thank you David.  I looked through the postgres mailing lists and didn't 
see a yum specific one.


I'll re-direct this to that list.

Jeff R.


Re: Help diagnosing replication (copy) error

2024-03-08 Thread Jeff Ross

On 3/8/24 14:50, Steve Baldwin wrote:


Hi,

I'm in the process of migrating a cluster from 15.3 to 16.2. We have a 
'zero downtime' requirement so I'm using logical replication to create 
the new cluster and then perform the switch in the application.


I have a situation where all but one table have done their initial 
copy. The remaining table is the largest (of course), and the 
replication slot that is assigned for the copy 
(pg_378075177_sync_60067_7343845372910323059) is showing as 
'active=false' if I select from pg_replication_slots on the publisher.


I've checked the recent logs for both the publishing cluster and the 
subscribing cluster but I can't see any replication errors. I guess I 
could have missed them, but it doesn't seem like anything is being 
'retried' like I've seen in the past with replication errors.


I've used this mechanism for zero-downtime upgrades multiple times in 
the past, and have recently used it to upgrade smaller clusters from 
15.x to 16.2 without issue.


The clusters are hosted on AWS RDS, so I have no access to the 
servers, but if that's the only way to diagnose the issue, I can 
create a support case.


Does anyone have any suggestions as to where I should look for the issue?

Thanks,

Steve


In our setup we're logically replicating a 450G database hosted on real 
hardware to an RDS instance.


Multiple times we've had replication simply stop and we could never find 
any reason for that on either publisher or subscriber.


The *only* solution that ever worked in these cases was dropping the 
subscription in RDS and re-creating it with (copy_data = false).


At that point replication picks right up again for new transactions 
*but* at the expense of losing all of the WAL that should have been 
replicated during the outage.  I wrote a python based "logical 
replication fixer" to fill in those gaps.


Given that the subscriber is the one that initiates the connection to 
the publisher and that as soon as the subscription is dropped and 
restarted replication resumes my hunch is that this is squarely on RDS.  
With both publisher and subscriber on RDS as in your case YMMV.


RDS is a black box--who knows what's really going on there?  It would be 
interesting to see what the response is after you open a support case.  
I hope you'll be able to share that with the list.


Jeff








After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-20 Thread Jeff Ross

Greetings!

I built a trigger fired process that copies an "order" from our 
production database to our dev database.  An order, in this case, is an 
initial row from a table and all of the rows in all of the tables in 
that database/schema that are needed to satisfy all of the foreign key 
constraints for the original insert.  Through a web page, one of our 
folks can select a schema and an order id to copy.  That information is 
then inserted into a table.  A trigger attached to that table takes care 
of copying the necessary rows using a function that uses both plython3u 
and psycopg2.  I can supply the source code if that will help.


On postgresql 10 using plpython2, this function worked great.

After migration to 15 (now 15.5) and a switch to plpython3 (no code 
change needed inside the function) logging inside the function tells me 
that everything completes except the exit.


I then get this error:

NOTICE:  update cargotel_common.copy_orders_to_dev set copy_completed = 
't', copy_completed_timestamp = clock_timestamp() where id = 21

ERROR:  cannot commit while a portal is pinned

What the heck?

I did find this error inside the source code.  This is from 15.6 source:

jross@workstation:~/postgresql-15.6$ grep -R -C20 "cannot commit while a 
portal is pinned" *

src/backend/utils/mmgr/portalmem.c- */
src/backend/utils/mmgr/portalmem.c-bool
src/backend/utils/mmgr/portalmem.c-PreCommit_Portals(bool isPrepare)
src/backend/utils/mmgr/portalmem.c-{
src/backend/utils/mmgr/portalmem.c-    bool        result = false;
src/backend/utils/mmgr/portalmem.c-    HASH_SEQ_STATUS status;
src/backend/utils/mmgr/portalmem.c-    PortalHashEnt *hentry;
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c- hash_seq_init(&status, PortalHashTable);
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-    while ((hentry = (PortalHashEnt 
*) hash_seq_search(&status)) != NULL)

src/backend/utils/mmgr/portalmem.c-    {
src/backend/utils/mmgr/portalmem.c-        Portal        portal = 
hentry->portal;

src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-        /*
src/backend/utils/mmgr/portalmem.c-         * There should be no pinned 
portals anymore. Complain if someone
src/backend/utils/mmgr/portalmem.c-         * leaked one. Auto-held 
portals are allowed; we assume that whoever

src/backend/utils/mmgr/portalmem.c-         * pinned them is managing them.
src/backend/utils/mmgr/portalmem.c-         */
src/backend/utils/mmgr/portalmem.c-        if (portal->portalPinned && 
!portal->autoHeld)
src/backend/utils/mmgr/portalmem.c:            elog(ERROR, "cannot 
commit while a portal is pinned");

src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-        /*
src/backend/utils/mmgr/portalmem.c-         * Do not touch active 
portals --- this can only happen in the case of
src/backend/utils/mmgr/portalmem.c-         * a multi-transaction 
utility command, such as VACUUM, or a commit in

src/backend/utils/mmgr/portalmem.c-         * a procedure.
src/backend/utils/mmgr/portalmem.c-         *
src/backend/utils/mmgr/portalmem.c-         * Note however that any 
resource owner attached to such a portal is
src/backend/utils/mmgr/portalmem.c-         * still going to go away, so 
don't leave a dangling pointer.  Also
src/backend/utils/mmgr/portalmem.c-         * unregister any snapshots 
held by the portal, mainly to avoid
src/backend/utils/mmgr/portalmem.c-         * snapshot leak warnings 
from ResourceOwnerRelease().

src/backend/utils/mmgr/portalmem.c-         */
src/backend/utils/mmgr/portalmem.c-        if (portal->status == 
PORTAL_ACTIVE)

src/backend/utils/mmgr/portalmem.c-        {
src/backend/utils/mmgr/portalmem.c-            if (portal->holdSnapshot)
src/backend/utils/mmgr/portalmem.c-            {
src/backend/utils/mmgr/portalmem.c-                if (portal->resowner)
src/backend/utils/mmgr/portalmem.c- 
UnregisterSnapshotFromOwner(portal->holdSnapshot,

src/backend/utils/mmgr/portalmem.c-                     portal->resowner);
src/backend/utils/mmgr/portalmem.c- portal->holdSnapshot = NULL;
src/backend/utils/mmgr/portalmem.c-            }

Do I have any idea of how to fix this after reading this bit of code?  No.

This error has been reported here before on December 2, 2022 and in 
other places as well.


https://www.postgresql.org/message-id/1061909348.200334.1669970706749%40mail.yahoo.com

No responses though to this message though.

In the code this:

src/backend/utils/mmgr/portalmem.c-     /*
src/backend/utils/mmgr/portalmem.c-         * There should be no pinned 
portals anymore. Complain if someone
src/backend/utils/mmgr/portalmem.c-         * leaked one. Auto-held 
portals are allowed; we assume that whoever

src/backend/utils/mmgr/portalmem.c-         * pinned them is managing them.
src/backend/utils/mmgr/portalmem.c-         */

makes me wonder if this error is specific to plpython3?

I can think of a way to maybe workaround this but thi

Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-20 Thread Jeff Ross

On 3/20/24 16:25, Adrian Klaver wrote:


On 3/20/24 15:18, Jeff Ross wrote:

Greetings!

I built a trigger fired process that copies an "order" from our 
production database to our dev database.  An order, in this case, is 
an initial row from a table and all of the rows in all of the tables 
in that database/schema that are needed to satisfy all of the foreign 
key constraints for the original insert. Through a web page, one of 
our folks can select a schema and an order id to copy.  That 
information is then inserted into a table.  A trigger attached to 
that table takes care of copying the necessary rows using a function 
that uses both plython3u and psycopg2.  I can supply the source code 
if that will help.


I think that will help, especially the interaction between psycopg2 
and plpython3u.



As requested:

https://openvistas.net/copy_orders_to_dev.html

Jeff





Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-20 Thread Jeff Ross



On 3/20/24 16:50, Tom Lane wrote:

Jeff Ross  writes:

I then get this error:
NOTICE:  update cargotel_common.copy_orders_to_dev set copy_completed =
't', copy_completed_timestamp = clock_timestamp() where id = 21
ERROR:  cannot commit while a portal is pinned

Would you mind supplying a self-contained example that triggers this?

regards, tom lane



I'll see if I can up with something.

Thanks for the reply!

Jeff





Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-21 Thread Jeff Ross

On 3/20/24 17:13, Adrian Klaver wrote:


On 3/20/24 15:52, Jeff Ross wrote:

On 3/20/24 16:25, Adrian Klaver wrote:


On 3/20/24 15:18, Jeff Ross wrote:

Greetings!

I built a trigger fired process that copies an "order" from our 
production database to our dev database.  An order, in this case, 
is an initial row from a table and all of the rows in all of the 
tables in that database/schema that are needed to satisfy all of 
the foreign key constraints for the original insert. Through a web 
page, one of our folks can select a schema and an order id to 
copy.  That information is then inserted into a table.  A trigger 
attached to that table takes care of copying the necessary rows 
using a function that uses both plython3u and psycopg2.  I can 
supply the source code if that will help.


I think that will help, especially the interaction between psycopg2 
and plpython3u.



As requested:

https://openvistas.net/copy_orders_to_dev.html


1) I have not gone through this thoroughly enough to figure out what 
is going on.


2) Things I have noticed, may not be relevant.

a) from psycopg2 import sql
   Never used.

I have a base template that I use for complex functions--this is a part 
of that.  Never caused a problem before.

b) #prod_database_connection.set_session(autocommit=True)
   #dev_database_connection.set_session(autocommit=True)

   Why are they commented out?


If autocommit is on, the following fails:

NOTICE:  217, create temp table if not exists load_temp (like 
wholesale.load including all) on commit drop; truncate load_temp;

ERROR:  psycopg2.errors.UndefinedTable: relation "load_temp" does not exist




c) prod_database_connection_string = "host='pgbouncer' dbname='%s' 
application_name = '%s'"


dev_database_connection_string = "host='pgbouncer' dbname='%s' 
application_name = '%s'"


What version of PgBouncer?


[rossj@cron ~]$ pgbouncer -V
PgBouncer 1.21.0
libevent 2.0.21-stable
adns: evdns2
tls: OpenSSL 1.0.2k-fips  26 Jan 2017


How is it setup?


session mode?




d) Why predefine all those cursors()?
Why not?  Sort of part of the template and the psycopg2 docs say that 
cursors are cheap.


e) Why is database global?


I think that's also a holdover from a previous function that I imported 
into this.


Thanks for the reply--gave me a good chance to do a little cleanup.  The 
error is happening when psycopg2 is trying to commit so I'll also ask there.


Jeff





Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross

On 3/20/24 17:04, Tom Lane wrote:


Adrian Klaver  writes:

Haven't had a chance to go through this yet. I'm going to say though
that Tom Lane is looking for a shorter generic case that anyone could
run on their system.

Yeah, it's a long way from that trigger function definition to a
working (i.e. failing) example.  Shortening the trigger might help by
eliminating some parts of the infrastructure that would need to be
shown --- but nobody's going to try to reverse-engineer all that.

regards, tom lane


It took some digging but I've found a very simple fix to this.

Somewhere (sorry, can't find it again) I read that a postgresql cursor 
is sometimes referenced as "portal".  This was when I was still pretty 
sure that this was a psycopg2 issue.


Further testing ruled that out--I wasn't getting the error on the 
psycopg2 commit statements, I was getting the error when the plpython3u 
function itself exits and tries to commit.


I only use one plpython3u cursor in that function.  The plpython docs say:

"Cursors are automatically disposed of. But if you want to explicitly 
release all resources held by a cursor, use the |close| method. Once 
closed, a cursor cannot be fetched from anymore."


https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3

Perhaps "pinned" in the error message means "open"?

I added a cursor.close() as the last line called in that function and it 
works again.


I haven't been able to come up with a test case that throws the same 
error, though, so I consider this a solution to what is very likely an 
odd corner case.


Jeff


Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross

On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to share 
a solution to this apparently rare error with the community.


Jeff


Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross



On 3/27/24 17:41, Adrian Klaver wrote:

On 3/27/24 16:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

I added a cursor.close() as the last line called in that function 
and it

works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

    regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


I would start with:

def logging(comment):
    global database
    <...>


Already removed that--thanks, though.




Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

2024-03-27 Thread Jeff Ross


On 3/27/24 17:35, Rob Sargent wrote:



On 3/27/24 17:05, Jeff Ross wrote:


On 3/27/24 15:44, Tom Lane wrote:


Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).


I added a cursor.close() as the last line called in that function and it
works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

regards, tom lane



Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to 
share a solution to this apparently rare error with the community.


Jeff

My read of Tom's reply suggests you still have work to do to find the 
other "reference" holding on to your cursor.


Yes, my read was the same.

There are exactly 3 references to that cursor now that I added the 
close() at the end.


Here are the first 2 (cursor renamed from the code I posted):

    plpy_cursor = plpy.cursor(schemas_query)
    while True:
    schema_rows = plpy_cursor.fetch(100)

The last is:

    plpy_cursor.close()

I don't know how to proceed further.


Re: Faster data load

2024-09-05 Thread Jeff Ross

On 9/5/24 14:14, Lok P wrote:


Hi,

We are having a requirement to create approx 50 billion rows in a 
partition table(~1 billion rows per partition, 200+gb size daily 
partitions) for a performance test. We are currently using ' insert 
into  select.. From  
or ;' method . We have dropped all indexes and 
constraints First and then doing the load. Still it's taking 2-3 hours 
to populate one partition. Is there a faster way to achieve this?


Few teammate suggesting to use copy command and use file load instead, 
which will be faster. So I wanted to understand, how different things 
it does behind the scenes as compared to insert as select command? As 
because it only deals with sql engine only.


Additionally, when we were trying to create indexes post data load on 
one partition, it took 30+ minutes. Any possible way to make it faster?


Is there any way to drive the above things in parallel by utilizing 
full database resources?


It's postgres 15.4

Regards
Lok


Try pg_bulkload to load the data--takes a little set up but it is very 
fast.  Do pay attention to the caveats.  For a performance test they 
probably won't be relevant.


https://github.com/ossc-db/pg_bulkload?tab=readme-ov-file

Jeff





Removing a subscription that does not exist

2021-07-09 Thread Jeff Ross

Hello,

I'm working with an RDS instance running 12 that has an old subscription 
that I can't seem to drop.


The logs show this, repeating every 5 seconds or so.

2021-07-09 16:08:07 UTC::@:[1637]:LOG: logical replication apply worker 
for subscription "metro" has started
2021-07-09 16:08:07 UTC::@:[1637]:ERROR: could not connect to the 
publisher: could not connect to server: Connection refused

Is the server running on host "dbp3" (108.200.30.101) and accepting
TCP/IP connections on port 5433?

dbp3 is long gone--the server no long exists.

It shows up here:

mirror_admin@metro_logical> select * from pg_subscription;
  oid  │ subdbid │ subname │ subowner │ subenabled 
│  subconninfo │   subslotname   │ subsynccommit 
│    subpublications

───┼─┼─┼──┼┼───┼─┼───┼
 83645 │   66754 │ cargowel_common │    16394 │ t  │ 
host=108.200.30.103 port=5433 user=postgres dbname=metro_prod │ 
cargowel_common │ off   │ {cargowel_common_prod}
 83646 │   66754 │ metro_prod  │    16394 │ t  │ 
host=108.200.30.103 port=5433 user=postgres dbname=metro_prod │ 
metro_prod  │ off   │ {metro_prod}
 51490 │   14313 │ metro   │    16394 │ t  │ 
dbname=metro host=dbp3 port=5433 user=repmgr                  │ 
metro   │ off   │ {metro}

(3 rows)

Time: 28.627 ms

But not in here:

mirror_admin@metro_logical> \dRs+
List of subscriptions
  Name   │    Owner │ Enabled │  Publication │ 
Synchronous commit │   Conninfo

─┼──┼─┼┼┼───
 cargowel_common │ mirror_admin │ t   │ {cargowel_common_prod} │ 
off    │ host=108.200.30.103 port=5433 user=postgres 
dbname=metro_prod
 metro_prod  │ mirror_admin │ t   │ {metro_prod} │ 
off    │ host=108.200.30.103 port=5433 user=postgres 
dbname=metro_prod

(2 rows)

And it can't be disabled or dropped:

mirror_admin@metro_logical> alter subscription metro disable;
ERROR:  subscription "metro" does not exist
Time: 24.263 ms
mirror_admin@metro_logical> drop subscription metro;
ERROR:  subscription "metro" does not exist
Time: 23.648 ms

I did try deleting it directly from the pg_subscription table but that 
failed with a permission denied error.  My suspicion is that's because 
of the RDS environment.


What else can I try to remove this old non-functional subscription?

Thanks,

Jeff Ross


Re: Removing a subscription that does not exist

2021-07-12 Thread Jeff Ross

On 7/11/21 7:38 PM, Kyotaro Horiguchi wrote:

At Fri, 9 Jul 2021 10:49:46 -0600, Jeff Ross  wrote in

Hello,

I'm working with an RDS instance running 12 that has an old
subscription that I can't seem to drop.

...
Look at the subdbid field in the first query result.  You were logging
into the databsae with OID=66754 and the subscription "metro" belongs
to the database 14313.  The second command doesn't show metro which is
not of the current database.



| What else can I try to remove this old non-functional subscription?

...

Thus you need to log in to the databse OID=14313 to manipulate on the
subsciption metro.

regards.

That was it exactly. Once I connected to that database the subscription 
could be disabled, its slot name set to None and finally dropped.


Thank you!

Jeff


NOTIFY queue is at 66% and climbing...

2021-10-13 Thread Jeff Ross

Hi all,

On 10.15 I'm getting the following on a logically replicated server.

From the CSV logs:

2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16 
18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66% 
full",""
2021-10-13 18:49:46.058 EDT,,,213601,,6143c257.34261,64244,,2021-09-16 
18:16:55 EDT,4/3914855,60709905,WARNING,01000,"NOTIFY queue is 66% 
full",""
2021-10-13 18:49:51.934 EDT,,,213601,,6143c257.34261,64245,,2021-09-16 
18:16:55 EDT,4/3914862,60709915,WARNING,01000,"NOTIFY queue is 66% 
full",""
2021-10-13 18:50:00.516 EDT,,,213601,,6143c257.34261,64246,,2021-09-16 
18:16:55 EDT,4/3914864,60709917,WARNING,01000,"NOTIFY queue is 66% 
full",""
2021-10-13 18:50:08.003 EDT,,,213601,,6143c257.34261,64247,,2021-09-16 
18:16:55 EDT,4/3914871,60709926,WARNING,01000,"NOTIFY queue is 66% 
full",""


The very fine docs for 10 say

"There is a queue that holds notifications that have been sent but not 
yet processed by all listening sessions. If this queue becomes full, 
transactions calling |NOTIFY| will fail at commit. The queue is quite 
large (8GB in a standard installation) and should be sufficiently sized 
for almost every use case. However, no cleanup can take place if a 
session executes |LISTEN| and then enters a transaction for a very long 
time. Once the queue is half full you will see warnings in the log file 
pointing you to the session that is preventing cleanup. In this case you 
should make sure that this session ends its current transaction so that 
cleanup can proceed."


We do have a client that has access to this logically replicated server 
and I have no idea what they have done regards LISTEN/NOTIFY, and my 
query to them about this has not yielded a response.


In the CSV logs above what part points to "the session that is 
preventing cleanup" so that I can kill it?  pg_stat_activity yields no 
clues.


Thanks,

Jeff


Re: NOTIFY queue is at 66% and climbing...

2021-10-14 Thread Jeff Ross




On 10/13/21 5:50 PM, Tom Lane wrote:

Jeff Ross  writes:

On 10.15 I'm getting the following on a logically replicated server.
2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16
18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66%
full",""
In the CSV logs above what part points to "the session that is
preventing cleanup" so that I can kill it?

Normally there's a DETAIL entry citing the session's PID.  Looking
at the code, the reason for the lack of any such entry must be that
there is no session whose current notify queue position exactly
matches the supposed global minimum position.  This corresponds to
a known bug that was fixed in 10.16, so I'd suggest upgrading.
As a temporary workaround you could restart that server, but
likely the problem would recur after awhile.

regards, tom lane

Thanks as always, Tom.  I'll schedule the upgrade to 10.18 and restart 
for this weekend.


Jeff




Re: Query performance issue

2024-10-22 Thread Jeff Ross

On 10/21/24 23:31, yudhi s wrote:



On Thu, Oct 17, 2024 at 3:06 AM Peter J. Holzer  wrote:


The execution plan looks like a postgresql execution plan, not a mysql
execution plan. Did you run this query on postgresql? That may be
interesting for comparison purposese, but ultimately it is
useless: You
won't get mysql to work like postgresql, and any tips to speed up this
query on postgresql (which is all you can expect on a postgresql
mailing
list) probably won't work on mysql.



Tried running the same in postgres and below is the plan from bothe 
postgres and mysql. Can you please guide me to understand ,  if 
anything else can be done to make it better?


https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8

Here's your postgres query as analyzed and made more readable by the 
most excellent explain.depesz.com.


https://explain.depesz.com/s/VyeM#html

Under the hints tab are suggestions to bump your work_mem to avoid 
writing sorts out to disk.


Jeff


Re: Removing terminal period from varchar string in table column

2025-07-15 Thread Jeff Ross

On 7/15/25 11:30, Rich Shepard wrote:


I want to remove the terminal period '.' from the varchar strings in the
'company_name' column in all rows with that period in the companies 
table.


I've looked at trim(), translate(), "substr(company_name 1,
length(compan_name) - 1)", and a couple of other functions and am 
unsure how

best to do this without corrupting the database table.

Advice needed.

TIA,

Rich




How about

test:

    select company_name, replace(company_name,'.','') from companies;

update:

    update companies set company_name = replace(company_name,'.','') 
where company_name like '%.';


?

Jeff