Re: COPY FROM - to avoid WAL generation
On 8/21/18 9:00 AM, Ravi Krishna wrote: In a recent thread of mine I learned something very interesting. If a table is created and data is loaded via COPY FROM within the same transaction, then PG will be smart enough to not generate WAL logs because all it needs to do is to track the status of the transaction and let the data load go to the new data file created for the table. If committed, the table is released for other sessions, if rolledback, vaccum will delete the data file later on. I tested it as follows for a table with 50 milllion rows. No indexes. Case 1 - create the table first. - in a separate transaction load the 50 million rows. Took 3 min 22 seconds Case 2 - start transaction - create table - load 50 million rows - commit transaction Took: 3 min 16 seconds. Am I missing anything? Have you looked into pg_bulkload? https://github.com/ossc-db/pg_bulkload Docs are here: http://ossc-db.github.io/pg_bulkload/index.html Jeff
Dubugging an intermittent foreign key insert error with csvlog
Hi all, I've been debugging an intermittent foreign key insert error on our single database / multi-tenant server. To help isolate traffic by tenant, I've switched to using csvlog and for the duration and have set log_min_duration_statement to 0 to get *everything*. Fortunately, daily 80G csvlogs compress nicely. For lack of a readable way to paste in the 12 lines of relevant csvlog into an e-mail, I've uploaded a very small 3K csv file to my web server at https://openvistas.net/hansens_error.csv The bare bones of the issue involve inserting a row into a table named load_det, then getting the serial sequence of that table, getting the last_value of that sequence and then inserting into another table named cargo_det using that retrieved last_value as the foreign key that ties this row to the load_det table. The vast majority of these succeed without issue but not all. The csvlog snip shows what I believe are 2 simultaneous but separate sessions and the session that attempts to insert into the cargo_det table is not the same session that inserted into the load_det table. That's what my hunch is but what is unclear to me is if those separate sessions are also in separate transactions. csvlog has a couple of columns that I'm unclear about: session_start_time, virtual_transaction_id. Is session_start_time the time inside a transaction block, as in beginning with a begin but before a commit or rollback? Or is it maybe just how long this pgbouncer session has been connected? virtual_transaction_id is defined in the docs as backendID/localXID--do separate backendIDs also represent separate transactions? Is there a better way to determine separate transactions within csvlog? Also, the app code that does this is legacy perl using DBD::Pg but the original code was written for Informix. We've been in the process of moving off informix for a while now and should be done within the month. I intend to re-work this to use returning id (available in postgres since 8.2!) instead of the serial sequence / last_value hack but not quite there yet. Thanks, Jeff Ross
Re: Dubugging an intermittent foreign key insert error with csvlog
On 9/16/19 4:07 PM, Adrian Klaver wrote: On 9/16/19 1:46 PM, Jeff Ross wrote: The csvlog snip shows what I believe are 2 simultaneous but separate sessions and the session that attempts to insert into the cargo_det table is not the same session that inserted into the load_det table. That's what my hunch is but what is unclear to me is if those separate sessions are also in separate transactions. To me it looks like the INSERT into load_det and into cargo_det are occurring in the same transaction(934281062). The part that would concern me is that: select last_value from load_det_id_seq occurs in different transactions and sessions. From here: https://www.postgresql.org/docs/11/sql-createsequence.html "Also, last_value will reflect the latest value reserved by any session, whether or not it has yet been returned by nextval." Especially as the error is coming from a different transaction(934281063) and session then the INSERTs. I'm guessing that there is cross talk on the sequence number fetch and application to cargo_det. Thank you Adrian--I think my hunch was basically correct then. Now all I need to do is figure out why we have multiple sessions and transactions. Or it might be time to skip ahead and get rid of the last_value query. Jeff
Re: Query which shows FK child columns?
On 11/14/19 11:49 AM, Ron wrote: v9.6.16 I have a query which shows the parents and children in FK relations, along with the parent column name, but can't seem to find the child column names. Is there a way to find the child column names without having to dig into pg_constraint? Thanks I do not think you can do this without using pg_constraint. I've been using this function to display those FKs. The original code isn't mine but as I recall I had to tweak it a little. This is on 10 and I can't remember if this was used on 9.6 but I'd be surprised if any of this won't work on 9.6. client@cargotel_dev> \sf cargotel_common.show_foreign_keys(text) CREATE OR REPLACE FUNCTION cargotel_common.show_foreign_keys(tablename text) RETURNS TABLE(table1 text, column1 text, type text, table2 text, column2 text) LANGUAGE plpgsql AS $function$ declare schemaname text; begin select into schemaname current_schema(); return query execute format(' select conrelid::regclass::text as table1, a.attname::text as column1, t.typname::text as type, confrelid::regclass::text as table2, af.attname::text as column2 from pg_attribute af, pg_attribute a, pg_type t, ( select conrelid, confrelid, conkey[i] as conkey, confkey[i] as confkey from ( select conrelid, confrelid, conkey, confkey, generate_series(1,array_upper(conkey,1)) as i from pg_constraint where contype = ''f'' ) ss) ss2 where af.attnum = confkey and af.attrelid = confrelid and a.attnum = conkey and a.attrelid = conrelid and a.atttypid = t.oid and confrelid::regclass = ''%I.%I''::regclass order by 1,2;',schemaname,tablename); end; $function$ I use column headings "table 1, column1, table2, column2" but It's easy enough to tweak the column labels. Example: client@cargotel_dev> \d+ ref_acct_cache Table "client.ref_acct_cache" Column │ Type │ Collation │ Nullable │ Default │ Storage │ Stats target │ Description ┼─┼───┼──┼┼──┼──┼─ id │ integer │ │ not null │ nextval('ref_acct_cache_id_seq'::regclass) │ plain │ │ descr │ text │ │ │ │ extended │ │ Indexes: "ref_acct_cache_pkey" PRIMARY KEY, btree (id) Referenced by: TABLE "acct_cache" CONSTRAINT "acct_cache_type_id_ref_acct_cache_id_fk" FOREIGN KEY (type_id) REFERENCES ref_acct_cache(id) client@cargotel_dev> select * from cargotel_common.show_foreign_keys('ref_acct_cache'); table1 │ column1 │ type │ table2 │ column2 ┼─┼──┼┼─ acct_cache │ type_id │ int4 │ ref_acct_cache │ id (1 row) client@cargotel_dev> \d+ acct_cache Table "client.acct_cache" Column │ Type │ Collation │ Nullable │ Default │ Storage │ Stats target │ Description ───┼──┼───┼──┼┼──┼──┼─ id │ integer │ │ not null │ nextval('acct_cache_id_seq'::regclass) │ plain │ │ type_id │ integer │ │ │ │ plain │ │ prefix │ text │ │ │ │ extended │ │ data │ text │ │ │ │ extended │ │ amount │ numeric │ │ │ │ main │ │ timestamp │ timestamp with time zone │ │ │ │ plain │ │ check_number │ text │ │ │ │ extended │ │ client_number │ text │ │ │ │ extended │ │ check_date │ date │ │ │ │ plain │ │ Indexes: "acct_cache_pkey" PRIMARY KEY, btree (id) "acct_cache_prefix_type_id_data_idx" btree (prefix, type_id, data) "acct_ca
Re: Question: what is proper way to define python function as event_trigger?
On 2019-12-22 15:27, Andrei Pozolotin wrote: Hello. Problem: 1. any attempt to define python function as an event_trigger, i.e.: CREATE FUNCTION public.verify() RETURNS event_trigger LANGUAGE 'plpython3u' AS $$ print("hello-kitty") $$; 2. fails with message: ERROR: trigger functions can only be called as triggers SQL state: 0A000 3. here in the source: https://github.com/postgres/postgres/blob/master/src/pl/plpython/plpy_procedure.c#L226 Question: what is proper way to define python function as event_trigger? Thank you. Just do "create function public.verify() as trigger..." https://www.postgresql.org/docs/10/plpython-trigger.html TD["event"] contains the type of event as a string and I routinely do things like if TD["event"] == "UPDATE": #do update stuff elif TD["event'} == "INSERT": #do insert related stuff Jeff
Re: Question: what is proper way to define python function as event_trigger?
On 2019-12-22 16:07, Jeff Ross wrote: On 2019-12-22 15:27, Andrei Pozolotin wrote: Hello. Problem: 1. any attempt to define python function as an event_trigger, i.e.: CREATE FUNCTION public.verify() RETURNS event_trigger LANGUAGE 'plpython3u' AS $$ print("hello-kitty") $$; 2. fails with message: ERROR: trigger functions can only be called as triggers SQL state: 0A000 3. here in the source: https://github.com/postgres/postgres/blob/master/src/pl/plpython/plpy_procedure.c#L226 Question: what is proper way to define python function as event_trigger? Thank you. Just do "create function public.verify() as trigger..." My bad--that should be "create function public.verify() returns trigger... " Jeff
Getting more detail in plpython error messages
Hi all, In psql a database error will print both ERROR: and DETAIL: lines. postgres@testdb# delete from inspection where bundle_id in (select id from test_archive_20170401.load order by id); ERROR: update or delete on table "inspection" violates foreign key constraint "inspection_weather_inspection_id_inspection_id_fk" on table "inspection_weather" DETAIL: Key (id)=(158967) is still referenced from table "inspection_weather". With plpython (both u and 3u) all I see printed is the ERROR part. try: check = plpy.execute("delete from inspection where bundle_id in (select id from test_archive_20170401.load order by id)") except plpy.SPIError as e: plpy.notice("Error!", e) postgres@testdb# select * from test_delete(); NOTICE: ('Error!', ForeignKeyViolation('update or delete on table "inspection" violates foreign key constraint "inspection_weather_inspection_id_inspection_id_fk" on table "inspection_weather"',)) Is there a way to get the DETAIL part as well? Thanks, Jeff
Re: Getting more detail in plpython error messages
On 3/7/22 11:06 AM, Tom Lane wrote: Jeff Ross writes: Is there a way to get the DETAIL part as well? It's not very well documented AFAICS, but a SPIError object has a "detail" attribute, so "e.detail" should help you. It looks like you might prefer to print "e.spidata", which seems to contain all the available fields. regards, tom lane Thank you, Tom! As always, that is exactly what I need. Jeff
Logically replicated table has no visible rows
Hello, We have a logically replicated table on RDS that is 39 G in size on both the publisher (10.21) and the subscriber (12.8). The replication slots on the publisher are all marked as active and the lsns are current so no lag. Other tables on the subscriber side are also identical in size and have no problem with queries against them. We did a vacuum full on the subscriber table and the size dropped to a couple hundred MBs and is growing but still has no visible rows. Now on the publisher I see a temporary replication slot so I'm guessing that in reality replication had not finished and that's why the load_events table was full sized but had no visible rows? At this point I guess my questions are does my hunch that replication hadn't completely finished is the root cause of what we saw make sense? I've always used both the lsns and the lack of temporary replication slots to tell when logical replication had finished--is there a more authoritative way to do so? Thanks, Jeff
Re: Logically replicated table has no visible rows
On 5/31/22 11:46 AM, Jeff Ross wrote: Hello, We have a logically replicated table on RDS that is 39 G in size on both the publisher (10.21) and the subscriber (12.8). The replication slots on the publisher are all marked as active and the lsns are current so no lag. Other tables on the subscriber side are also identical in size and have no problem with queries against them. We did a vacuum full on the subscriber table and the size dropped to a couple hundred MBs and is growing but still has no visible rows. Now on the publisher I see a temporary replication slot so I'm guessing that in reality replication had not finished and that's why the load_events table was full sized but had no visible rows? At this point I guess my questions are does my hunch that replication hadn't completely finished is the root cause of what we saw make sense? I've always used both the lsns and the lack of temporary replication slots to tell when logical replication had finished--is there a more authoritative way to do so? Thanks, Jeff As a follow up for the archives... It became apparent the longer we looked that logical replication in fact had not finished and indeed never did finish. On both an EC2 server and an RDS server we were missing at least one table that by table size appeared to be identical to the publisher's table and yet had no visible rows. On the publisher I would occasionally see temporary replication slots but they would go away in a few minutes. I then dropped the subscription on the RDS server and immediately restarted it with (copy_data = False). As expected we immediately saw rows being inserted into that massive table but it still showed only those rows. I have a "backfill" process to sync up logically replicated tables and that ran but very, very slowly. Finally while syncing that big table it stalled completely. Network monitoring showed about 100Kb/s traffic to the RDS server. That is not a typo. Our client spun up another RDS server, imported all the settings from the previous one and I setup the database and imported the schemas. As soon as we started replication we were seeing network transfers in the 45Mb/s range. Replication finished with all tables intact in under 5 hours for 420G. We changed nothing on our side so whatever was causing the glacial data transfer was on AWS side. Jeff
15 pg_upgrade with -j
Hello! We are moving from 10 to 15 and are in testing now. Our development database is about 1400G and takes 12 minutes to complete a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 server with 80 cores. Adding -j 40 to use half of those cores also finishes in 12 minutes and ps / top/ htop never show more than a single process at a time in use. Bumping that to -j 80 to use them all also finishes in 12 minutes and still only a single process. Running the suggested vacuum analyze after pg_upgrade completes takes about 19 minutes. Adding -j 40 takes that time down to around 5 minutes, jumps the server load up over 30 and htop shows 40 processes. If -j 40 helps there--why not with pg_upgrade? The full commands we are using for pg_upgrade are pretty stock: time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40 time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80 Our production database is closer to 1900G. If we're looking at a 30 minute pg_upgrade window we'll be okay but if there is anything we can do to knock that time down we will and any suggestions to do so would be greatly appreciated. Jeff Ross
Re: 15 pg_upgrade with -j
On 5/22/23 5:24 PM, Adrian Klaver wrote: On 5/22/23 16:20, Jeff Ross wrote: Hello! We are moving from 10 to 15 and are in testing now. Our development database is about 1400G and takes 12 minutes to complete a pg_upgrade with the -k (hard-links) version. This is on a CentOS 7 server with 80 cores. Adding -j 40 to use half of those cores also finishes in 12 minutes and ps / top/ htop never show more than a single process at a time in use. Bumping that to -j 80 to use them all also finishes in 12 minutes and still only a single process. Running the suggested vacuum analyze after pg_upgrade completes takes about 19 minutes. Adding -j 40 takes that time down to around 5 minutes, jumps the server load up over 30 and htop shows 40 processes. If -j 40 helps there--why not with pg_upgrade? From docs: https://www.postgresql.org/docs/current/pgupgrade.html The --jobs option allows multiple CPU cores to be used for copying/linking of files and to dump and restore database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine. So is the 1400G mostly in one database in the cluster? The full commands we are using for pg_upgrade are pretty stock: time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 40 time /usr/pgsql-15/bin/pg_upgrade -b /usr/pgsql-10/bin/ -B /usr/pgsql-15/bin/ -d /var/lib/pgsql/10/data -D /var/lib/pgsql/15up -k -j 80 Our production database is closer to 1900G. If we're looking at a 30 minute pg_upgrade window we'll be okay but if there is anything we can do to knock that time down we will and any suggestions to do so would be greatly appreciated. Jeff Ross Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right? Jeff
Re: 15 pg_upgrade with -j
On 5/22/23 5:42 PM, Tom Lane wrote: Jeff Ross writes: On 5/22/23 5:24 PM, Adrian Klaver wrote: So is the 1400G mostly in one database in the cluster? Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right? AFAICT from a quick look at the code, you won't get any meaningful parallelism unless you have several large DBs and/or several large tablespaces. It looks like the assumption was that issuing link() requests in parallel wouldn't help much but just swamp your disk if they're all on the same filesystem. Maybe that could use rethinking, not sure. regards, tom lane Thanks Tom. These are all smokingly fast SSDs so it would be interesting to see how well they'd hold up under that load. Jeff
Re: 15 pg_upgrade with -j
On 5/22/23 5:43 PM, Adrian Klaver wrote: From docs: https://www.postgresql.org/docs/current/pgupgrade.html The --jobs option allows multiple CPU cores to be used for copying/linking of files and to dump and restore database schemas in parallel; a good place to start is the maximum of the number of CPU cores and tablespaces. This option can dramatically reduce the time to upgrade a multi-database server running on a multiprocessor machine. So is the 1400G mostly in one database in the cluster? The full commands we are using for pg_upgrade are pretty stock: Yes, one big database with about 80 schemas and several other smaller databases so -j should help, right? As I understand it no. That the parallelism is between databases not within a database. Further that 'database schemas' refers to schema as the overall database object definitions not the namespaces known as schemas in the database. Thanks Adrian. That "restore database schemas in parallel" phrase seems like it would be really easy to read like we did and expect it to work with one database and multiple schemas. Maybe it should be changed to "restore multiple databases in parallel" instead? Jeff
Re: Question: Multiple pg clusters on one server can be reached with the standard port.
We have already looked at pgbouncer and it works with that but unfortunately you have to do the authentication in pgbouncer. Which we don't like so much. Regards, Michael You can set up pgbouncer to authenticate in postgres: https://www.2ndquadrant.com/en/blog/understanding-user-management-in-pgbouncer/ See the "auth_query, auth_user" section. https://www.2ndquadrant.com/en/blog/pg-phriday-securing-pgbouncer/ Jeff
No yum repo for CentOS 7 and postgres 16?
Hi all, CentOS 7 isn't quite dead yet but it appears that CentOS7 is not included in the new yum repo file including 16. Here's a bit from the latest repo file: Name : pgdg-redhat-repo Version : 42.0 Release : 35PGDG Architecture: noarch Install Date: (not installed) Group : Unspecified Size : 15459 License : PostgreSQL Signature : DSA/SHA1, Thu 14 Sep 2023 06:39:48 AM MDT, Key ID 1f16d2e1442df0f8 Source RPM : pgdg-redhat-repo-42.0-35PGDG.src.rpm Build Date : Thu 14 Sep 2023 06:39:32 AM MDT Build Host : koji-rhel-9-x86-64-pgbuild Vendor : PostgreSQL Global Development Group URL : https://yum.postgresql.org Summary : PostgreSQL PGDG RPMs- Yum Repository Configuration for Red Hat / Rocky / CentOS Description : This package contains yum configuration for Red Hat Enterprise Linux, CentOS, and also the GPG key for PGDG RPMs. * Tue Sep 12 2023 Devrim Gündüz - 42.0-35PGDG - Add v16 repos - Remove v16 repos from RHEL 7 Really? Might one inquire as to why? Yes, CentOS 7 is headed for EOL but not until June 30, 2024. Do those of us still on CentOS 7 wanting to upgrade to 16 now have to build from source? Jeff Ross
Re: No yum repo for CentOS 7 and postgres 16?
On 10/5/23 15:46, David G. Johnston wrote: On Thursday, October 5, 2023, Jeff Ross wrote: Hi all, CentOS 7 isn't quite dead yet but it appears that CentOS7 is not included in the new yum repo file including 16. Do those of us still on CentOS 7 wanting to upgrade to 16 now have to build from source? https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/ <https://yum.postgresql.org/news/rhel7-postgresql-rpms-end-of-life/> David J. Thank you David. I looked through the postgres mailing lists and didn't see a yum specific one. I'll re-direct this to that list. Jeff R.
Re: Help diagnosing replication (copy) error
On 3/8/24 14:50, Steve Baldwin wrote: Hi, I'm in the process of migrating a cluster from 15.3 to 16.2. We have a 'zero downtime' requirement so I'm using logical replication to create the new cluster and then perform the switch in the application. I have a situation where all but one table have done their initial copy. The remaining table is the largest (of course), and the replication slot that is assigned for the copy (pg_378075177_sync_60067_7343845372910323059) is showing as 'active=false' if I select from pg_replication_slots on the publisher. I've checked the recent logs for both the publishing cluster and the subscribing cluster but I can't see any replication errors. I guess I could have missed them, but it doesn't seem like anything is being 'retried' like I've seen in the past with replication errors. I've used this mechanism for zero-downtime upgrades multiple times in the past, and have recently used it to upgrade smaller clusters from 15.x to 16.2 without issue. The clusters are hosted on AWS RDS, so I have no access to the servers, but if that's the only way to diagnose the issue, I can create a support case. Does anyone have any suggestions as to where I should look for the issue? Thanks, Steve In our setup we're logically replicating a 450G database hosted on real hardware to an RDS instance. Multiple times we've had replication simply stop and we could never find any reason for that on either publisher or subscriber. The *only* solution that ever worked in these cases was dropping the subscription in RDS and re-creating it with (copy_data = false). At that point replication picks right up again for new transactions *but* at the expense of losing all of the WAL that should have been replicated during the outage. I wrote a python based "logical replication fixer" to fill in those gaps. Given that the subscriber is the one that initiates the connection to the publisher and that as soon as the subscription is dropped and restarted replication resumes my hunch is that this is squarely on RDS. With both publisher and subscriber on RDS as in your case YMMV. RDS is a black box--who knows what's really going on there? It would be interesting to see what the response is after you open a support case. I hope you'll be able to share that with the list. Jeff
After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
Greetings! I built a trigger fired process that copies an "order" from our production database to our dev database. An order, in this case, is an initial row from a table and all of the rows in all of the tables in that database/schema that are needed to satisfy all of the foreign key constraints for the original insert. Through a web page, one of our folks can select a schema and an order id to copy. That information is then inserted into a table. A trigger attached to that table takes care of copying the necessary rows using a function that uses both plython3u and psycopg2. I can supply the source code if that will help. On postgresql 10 using plpython2, this function worked great. After migration to 15 (now 15.5) and a switch to plpython3 (no code change needed inside the function) logging inside the function tells me that everything completes except the exit. I then get this error: NOTICE: update cargotel_common.copy_orders_to_dev set copy_completed = 't', copy_completed_timestamp = clock_timestamp() where id = 21 ERROR: cannot commit while a portal is pinned What the heck? I did find this error inside the source code. This is from 15.6 source: jross@workstation:~/postgresql-15.6$ grep -R -C20 "cannot commit while a portal is pinned" * src/backend/utils/mmgr/portalmem.c- */ src/backend/utils/mmgr/portalmem.c-bool src/backend/utils/mmgr/portalmem.c-PreCommit_Portals(bool isPrepare) src/backend/utils/mmgr/portalmem.c-{ src/backend/utils/mmgr/portalmem.c- bool result = false; src/backend/utils/mmgr/portalmem.c- HASH_SEQ_STATUS status; src/backend/utils/mmgr/portalmem.c- PortalHashEnt *hentry; src/backend/utils/mmgr/portalmem.c- src/backend/utils/mmgr/portalmem.c- hash_seq_init(&status, PortalHashTable); src/backend/utils/mmgr/portalmem.c- src/backend/utils/mmgr/portalmem.c- while ((hentry = (PortalHashEnt *) hash_seq_search(&status)) != NULL) src/backend/utils/mmgr/portalmem.c- { src/backend/utils/mmgr/portalmem.c- Portal portal = hentry->portal; src/backend/utils/mmgr/portalmem.c- src/backend/utils/mmgr/portalmem.c- /* src/backend/utils/mmgr/portalmem.c- * There should be no pinned portals anymore. Complain if someone src/backend/utils/mmgr/portalmem.c- * leaked one. Auto-held portals are allowed; we assume that whoever src/backend/utils/mmgr/portalmem.c- * pinned them is managing them. src/backend/utils/mmgr/portalmem.c- */ src/backend/utils/mmgr/portalmem.c- if (portal->portalPinned && !portal->autoHeld) src/backend/utils/mmgr/portalmem.c: elog(ERROR, "cannot commit while a portal is pinned"); src/backend/utils/mmgr/portalmem.c- src/backend/utils/mmgr/portalmem.c- /* src/backend/utils/mmgr/portalmem.c- * Do not touch active portals --- this can only happen in the case of src/backend/utils/mmgr/portalmem.c- * a multi-transaction utility command, such as VACUUM, or a commit in src/backend/utils/mmgr/portalmem.c- * a procedure. src/backend/utils/mmgr/portalmem.c- * src/backend/utils/mmgr/portalmem.c- * Note however that any resource owner attached to such a portal is src/backend/utils/mmgr/portalmem.c- * still going to go away, so don't leave a dangling pointer. Also src/backend/utils/mmgr/portalmem.c- * unregister any snapshots held by the portal, mainly to avoid src/backend/utils/mmgr/portalmem.c- * snapshot leak warnings from ResourceOwnerRelease(). src/backend/utils/mmgr/portalmem.c- */ src/backend/utils/mmgr/portalmem.c- if (portal->status == PORTAL_ACTIVE) src/backend/utils/mmgr/portalmem.c- { src/backend/utils/mmgr/portalmem.c- if (portal->holdSnapshot) src/backend/utils/mmgr/portalmem.c- { src/backend/utils/mmgr/portalmem.c- if (portal->resowner) src/backend/utils/mmgr/portalmem.c- UnregisterSnapshotFromOwner(portal->holdSnapshot, src/backend/utils/mmgr/portalmem.c- portal->resowner); src/backend/utils/mmgr/portalmem.c- portal->holdSnapshot = NULL; src/backend/utils/mmgr/portalmem.c- } Do I have any idea of how to fix this after reading this bit of code? No. This error has been reported here before on December 2, 2022 and in other places as well. https://www.postgresql.org/message-id/1061909348.200334.1669970706749%40mail.yahoo.com No responses though to this message though. In the code this: src/backend/utils/mmgr/portalmem.c- /* src/backend/utils/mmgr/portalmem.c- * There should be no pinned portals anymore. Complain if someone src/backend/utils/mmgr/portalmem.c- * leaked one. Auto-held portals are allowed; we assume that whoever src/backend/utils/mmgr/portalmem.c- * pinned them is managing them. src/backend/utils/mmgr/portalmem.c- */ makes me wonder if this error is specific to plpython3? I can think of a way to maybe workaround this but thi
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/20/24 16:25, Adrian Klaver wrote: On 3/20/24 15:18, Jeff Ross wrote: Greetings! I built a trigger fired process that copies an "order" from our production database to our dev database. An order, in this case, is an initial row from a table and all of the rows in all of the tables in that database/schema that are needed to satisfy all of the foreign key constraints for the original insert. Through a web page, one of our folks can select a schema and an order id to copy. That information is then inserted into a table. A trigger attached to that table takes care of copying the necessary rows using a function that uses both plython3u and psycopg2. I can supply the source code if that will help. I think that will help, especially the interaction between psycopg2 and plpython3u. As requested: https://openvistas.net/copy_orders_to_dev.html Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/20/24 16:50, Tom Lane wrote: Jeff Ross writes: I then get this error: NOTICE: update cargotel_common.copy_orders_to_dev set copy_completed = 't', copy_completed_timestamp = clock_timestamp() where id = 21 ERROR: cannot commit while a portal is pinned Would you mind supplying a self-contained example that triggers this? regards, tom lane I'll see if I can up with something. Thanks for the reply! Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/20/24 17:13, Adrian Klaver wrote: On 3/20/24 15:52, Jeff Ross wrote: On 3/20/24 16:25, Adrian Klaver wrote: On 3/20/24 15:18, Jeff Ross wrote: Greetings! I built a trigger fired process that copies an "order" from our production database to our dev database. An order, in this case, is an initial row from a table and all of the rows in all of the tables in that database/schema that are needed to satisfy all of the foreign key constraints for the original insert. Through a web page, one of our folks can select a schema and an order id to copy. That information is then inserted into a table. A trigger attached to that table takes care of copying the necessary rows using a function that uses both plython3u and psycopg2. I can supply the source code if that will help. I think that will help, especially the interaction between psycopg2 and plpython3u. As requested: https://openvistas.net/copy_orders_to_dev.html 1) I have not gone through this thoroughly enough to figure out what is going on. 2) Things I have noticed, may not be relevant. a) from psycopg2 import sql Never used. I have a base template that I use for complex functions--this is a part of that. Never caused a problem before. b) #prod_database_connection.set_session(autocommit=True) #dev_database_connection.set_session(autocommit=True) Why are they commented out? If autocommit is on, the following fails: NOTICE: 217, create temp table if not exists load_temp (like wholesale.load including all) on commit drop; truncate load_temp; ERROR: psycopg2.errors.UndefinedTable: relation "load_temp" does not exist c) prod_database_connection_string = "host='pgbouncer' dbname='%s' application_name = '%s'" dev_database_connection_string = "host='pgbouncer' dbname='%s' application_name = '%s'" What version of PgBouncer? [rossj@cron ~]$ pgbouncer -V PgBouncer 1.21.0 libevent 2.0.21-stable adns: evdns2 tls: OpenSSL 1.0.2k-fips 26 Jan 2017 How is it setup? session mode? d) Why predefine all those cursors()? Why not? Sort of part of the template and the psycopg2 docs say that cursors are cheap. e) Why is database global? I think that's also a holdover from a previous function that I imported into this. Thanks for the reply--gave me a good chance to do a little cleanup. The error is happening when psycopg2 is trying to commit so I'll also ask there. Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/20/24 17:04, Tom Lane wrote: Adrian Klaver writes: Haven't had a chance to go through this yet. I'm going to say though that Tom Lane is looking for a shorter generic case that anyone could run on their system. Yeah, it's a long way from that trigger function definition to a working (i.e. failing) example. Shortening the trigger might help by eliminating some parts of the infrastructure that would need to be shown --- but nobody's going to try to reverse-engineer all that. regards, tom lane It took some digging but I've found a very simple fix to this. Somewhere (sorry, can't find it again) I read that a postgresql cursor is sometimes referenced as "portal". This was when I was still pretty sure that this was a psycopg2 issue. Further testing ruled that out--I wasn't getting the error on the psycopg2 commit statements, I was getting the error when the plpython3u function itself exits and tries to commit. I only use one plpython3u cursor in that function. The plpython docs say: "Cursors are automatically disposed of. But if you want to explicitly release all resources held by a cursor, use the |close| method. Once closed, a cursor cannot be fetched from anymore." https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3 Perhaps "pinned" in the error message means "open"? I added a cursor.close() as the last line called in that function and it works again. I haven't been able to come up with a test case that throws the same error, though, so I consider this a solution to what is very likely an odd corner case. Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 17:41, Adrian Klaver wrote: On 3/27/24 16:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. I would start with: def logging(comment): global database <...> Already removed that--thanks, though.
Re: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function
On 3/27/24 17:35, Rob Sargent wrote: On 3/27/24 17:05, Jeff Ross wrote: On 3/27/24 15:44, Tom Lane wrote: Perhaps "pinned" in the error message means "open"? No, it means "pinned" ... but I see that plpython pins the portal underlying any PLyCursor object it creates. Most of our PLs do that too, to prevent a portal from disappearing under them (e.g. if you were to try to close the portal directly from SQL rather than via whatever mechanism the PL wants you to use). I added a cursor.close() as the last line called in that function and it works again. It looks to me like PLy_cursor_close does pretty much exactly the same cleanup as PLy_cursor_dealloc, including unpinning and closing the underlying portal. I'm far from a Python expert, but I suspect that the docs you quote intend to say "cursors are disposed of when Python garbage-collects them", and that the reason your code is failing is that there's still a reference to the PLyCursor somewhere after the plpython function exits, perhaps in a Python global variable. regards, tom lane Thank you for your reply, as always, Tom! Debugging at this level might well be over my paygrade ;-) I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community. Jeff My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor. Yes, my read was the same. There are exactly 3 references to that cursor now that I added the close() at the end. Here are the first 2 (cursor renamed from the code I posted): plpy_cursor = plpy.cursor(schemas_query) while True: schema_rows = plpy_cursor.fetch(100) The last is: plpy_cursor.close() I don't know how to proceed further.
Re: Faster data load
On 9/5/24 14:14, Lok P wrote: Hi, We are having a requirement to create approx 50 billion rows in a partition table(~1 billion rows per partition, 200+gb size daily partitions) for a performance test. We are currently using ' insert into select.. From or ;' method . We have dropped all indexes and constraints First and then doing the load. Still it's taking 2-3 hours to populate one partition. Is there a faster way to achieve this? Few teammate suggesting to use copy command and use file load instead, which will be faster. So I wanted to understand, how different things it does behind the scenes as compared to insert as select command? As because it only deals with sql engine only. Additionally, when we were trying to create indexes post data load on one partition, it took 30+ minutes. Any possible way to make it faster? Is there any way to drive the above things in parallel by utilizing full database resources? It's postgres 15.4 Regards Lok Try pg_bulkload to load the data--takes a little set up but it is very fast. Do pay attention to the caveats. For a performance test they probably won't be relevant. https://github.com/ossc-db/pg_bulkload?tab=readme-ov-file Jeff
Removing a subscription that does not exist
Hello, I'm working with an RDS instance running 12 that has an old subscription that I can't seem to drop. The logs show this, repeating every 5 seconds or so. 2021-07-09 16:08:07 UTC::@:[1637]:LOG: logical replication apply worker for subscription "metro" has started 2021-07-09 16:08:07 UTC::@:[1637]:ERROR: could not connect to the publisher: could not connect to server: Connection refused Is the server running on host "dbp3" (108.200.30.101) and accepting TCP/IP connections on port 5433? dbp3 is long gone--the server no long exists. It shows up here: mirror_admin@metro_logical> select * from pg_subscription; oid │ subdbid │ subname │ subowner │ subenabled │ subconninfo │ subslotname │ subsynccommit │ subpublications ───┼─┼─┼──┼┼───┼─┼───┼ 83645 │ 66754 │ cargowel_common │ 16394 │ t │ host=108.200.30.103 port=5433 user=postgres dbname=metro_prod │ cargowel_common │ off │ {cargowel_common_prod} 83646 │ 66754 │ metro_prod │ 16394 │ t │ host=108.200.30.103 port=5433 user=postgres dbname=metro_prod │ metro_prod │ off │ {metro_prod} 51490 │ 14313 │ metro │ 16394 │ t │ dbname=metro host=dbp3 port=5433 user=repmgr │ metro │ off │ {metro} (3 rows) Time: 28.627 ms But not in here: mirror_admin@metro_logical> \dRs+ List of subscriptions Name │ Owner │ Enabled │ Publication │ Synchronous commit │ Conninfo ─┼──┼─┼┼┼─── cargowel_common │ mirror_admin │ t │ {cargowel_common_prod} │ off │ host=108.200.30.103 port=5433 user=postgres dbname=metro_prod metro_prod │ mirror_admin │ t │ {metro_prod} │ off │ host=108.200.30.103 port=5433 user=postgres dbname=metro_prod (2 rows) And it can't be disabled or dropped: mirror_admin@metro_logical> alter subscription metro disable; ERROR: subscription "metro" does not exist Time: 24.263 ms mirror_admin@metro_logical> drop subscription metro; ERROR: subscription "metro" does not exist Time: 23.648 ms I did try deleting it directly from the pg_subscription table but that failed with a permission denied error. My suspicion is that's because of the RDS environment. What else can I try to remove this old non-functional subscription? Thanks, Jeff Ross
Re: Removing a subscription that does not exist
On 7/11/21 7:38 PM, Kyotaro Horiguchi wrote: At Fri, 9 Jul 2021 10:49:46 -0600, Jeff Ross wrote in Hello, I'm working with an RDS instance running 12 that has an old subscription that I can't seem to drop. ... Look at the subdbid field in the first query result. You were logging into the databsae with OID=66754 and the subscription "metro" belongs to the database 14313. The second command doesn't show metro which is not of the current database. | What else can I try to remove this old non-functional subscription? ... Thus you need to log in to the databse OID=14313 to manipulate on the subsciption metro. regards. That was it exactly. Once I connected to that database the subscription could be disabled, its slot name set to None and finally dropped. Thank you! Jeff
NOTIFY queue is at 66% and climbing...
Hi all, On 10.15 I'm getting the following on a logically replicated server. From the CSV logs: 2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16 18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66% full","" 2021-10-13 18:49:46.058 EDT,,,213601,,6143c257.34261,64244,,2021-09-16 18:16:55 EDT,4/3914855,60709905,WARNING,01000,"NOTIFY queue is 66% full","" 2021-10-13 18:49:51.934 EDT,,,213601,,6143c257.34261,64245,,2021-09-16 18:16:55 EDT,4/3914862,60709915,WARNING,01000,"NOTIFY queue is 66% full","" 2021-10-13 18:50:00.516 EDT,,,213601,,6143c257.34261,64246,,2021-09-16 18:16:55 EDT,4/3914864,60709917,WARNING,01000,"NOTIFY queue is 66% full","" 2021-10-13 18:50:08.003 EDT,,,213601,,6143c257.34261,64247,,2021-09-16 18:16:55 EDT,4/3914871,60709926,WARNING,01000,"NOTIFY queue is 66% full","" The very fine docs for 10 say "There is a queue that holds notifications that have been sent but not yet processed by all listening sessions. If this queue becomes full, transactions calling |NOTIFY| will fail at commit. The queue is quite large (8GB in a standard installation) and should be sufficiently sized for almost every use case. However, no cleanup can take place if a session executes |LISTEN| and then enters a transaction for a very long time. Once the queue is half full you will see warnings in the log file pointing you to the session that is preventing cleanup. In this case you should make sure that this session ends its current transaction so that cleanup can proceed." We do have a client that has access to this logically replicated server and I have no idea what they have done regards LISTEN/NOTIFY, and my query to them about this has not yielded a response. In the CSV logs above what part points to "the session that is preventing cleanup" so that I can kill it? pg_stat_activity yields no clues. Thanks, Jeff
Re: NOTIFY queue is at 66% and climbing...
On 10/13/21 5:50 PM, Tom Lane wrote: Jeff Ross writes: On 10.15 I'm getting the following on a logically replicated server. 2021-10-13 18:49:39.792 EDT,,,213601,,6143c257.34261,64243,,2021-09-16 18:16:55 EDT,4/3914851,60709901,WARNING,01000,"NOTIFY queue is 66% full","" In the CSV logs above what part points to "the session that is preventing cleanup" so that I can kill it? Normally there's a DETAIL entry citing the session's PID. Looking at the code, the reason for the lack of any such entry must be that there is no session whose current notify queue position exactly matches the supposed global minimum position. This corresponds to a known bug that was fixed in 10.16, so I'd suggest upgrading. As a temporary workaround you could restart that server, but likely the problem would recur after awhile. regards, tom lane Thanks as always, Tom. I'll schedule the upgrade to 10.18 and restart for this weekend. Jeff
Re: Query performance issue
On 10/21/24 23:31, yudhi s wrote: On Thu, Oct 17, 2024 at 3:06 AM Peter J. Holzer wrote: The execution plan looks like a postgresql execution plan, not a mysql execution plan. Did you run this query on postgresql? That may be interesting for comparison purposese, but ultimately it is useless: You won't get mysql to work like postgresql, and any tips to speed up this query on postgresql (which is all you can expect on a postgresql mailing list) probably won't work on mysql. Tried running the same in postgres and below is the plan from bothe postgres and mysql. Can you please guide me to understand , if anything else can be done to make it better? https://gist.github.com/databasetech0073/746353a9e76d5e29b2fc6abdc80cdef8 Here's your postgres query as analyzed and made more readable by the most excellent explain.depesz.com. https://explain.depesz.com/s/VyeM#html Under the hints tab are suggestions to bump your work_mem to avoid writing sorts out to disk. Jeff
Re: Removing terminal period from varchar string in table column
On 7/15/25 11:30, Rich Shepard wrote: I want to remove the terminal period '.' from the varchar strings in the 'company_name' column in all rows with that period in the companies table. I've looked at trim(), translate(), "substr(company_name 1, length(compan_name) - 1)", and a couple of other functions and am unsure how best to do this without corrupting the database table. Advice needed. TIA, Rich How about test: select company_name, replace(company_name,'.','') from companies; update: update companies set company_name = replace(company_name,'.','') where company_name like '%.'; ? Jeff