Re: libpq: Which functions may hang due to network issues?
On 5. Dec 2021, at 21:32, Daniel Frey wrote: > >> On 5. Dec 2021, at 17:01, Tom Lane wrote: >> >> Daniel Frey writes: >>> With all that said, I think that PostgreSQL/libpq should have a clear, >>> documented way to get rid of a connection that is guaranteed to not hang. >>> It has something similar for almost all other methods like opening >>> connections, sending request, retrieving results. Why stop there? >> >> AFAICS, PQfinish() already acts that way, at least up to the same level of >> guarantee as you have for "all other methods". That is, if you previously >> set the connection into nonblock mode, it won't block. One more question about this: What is the purpose of *not* using nonblocking mode with PQfinish()? Is there any benefit to the user in waiting for something? Or could it make sense for PQfinish() to always use nonblocking mode internally?
Are Foreign Key Disabled During Logical Replication Initial Sync?
Hi all, I understood that during logical replication initial sync tables are copied in full using some type of "copy command". Does it mean that populating each table is done in a single transaction? If so, when I have tables with foreign keys between them, is it guaranteed that logical replication will populates the tables in the proper order so the foreign key will be enforced? Or maybe the foreign keys are disabled during initial sync and only after all tables are populated the FKs are enabled again. Thanks! IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.
Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
I keep running into problems like these: Devs are using an ORM. It really likes to produce queries like: SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; I don't know why they do this. Usually it's more like 50 for pagination which make more sense. But for whatever reason this keeps coming up. The table has nearly 29 million records. 5069 of them match shipment_import_id = 5090609. There is an index on shipment_import_id, which the planner happily uses without the LIMIT specifically. Yet with it the query planner will always do something like: # explain SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; QUERY PLAN - Limit (cost=0.44..873.35 rows=1 width=243) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..517.70 rows=5868 width=243) Filter: (shipment_import_id = 5090609) .. which takes minutes. I know I can work around this. Generally I would just drop the index on shipment_import_id and create one on shipment_import_id,id. Or if I can get the devs to wrap their query in an inner select with a fake offset to fool the query planner that works too. But both seem hacky. Just wondering if there's a knob I can turn to make these more likely to work without constantly implementing workarounds? Thanks for any help.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
What is your default_statistics_target and how accurate is that estimate of 5668 rows? What is random_page_cost set to by the way? More importantly, what is the better plan that you'd like the planner to use with your existing indexes? It would seem logical to me to scan for the matching shipment_import_id if the estimate is saying 5868 out of 29 million should match and then sort and only get the smallest ID. Doing an index scan on ID and looking up in the table to see if shipment_import_id matches when the planner expects that to be about a .0001 chance... I can't imagine that plan performing well at all. Certainly a composite index would be very helpful here. Using explain analyze and sharing the output would give more info to go on. >
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On 12/6/21 10:03 AM, Alan Hodgson wrote: I keep running into problems like these: Devs are using an ORM. It really likes to produce queries like: SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; I don't know why they do this. Usually it's more like 50 for pagination which make more sense. But for whatever reason this keeps coming up. To be clear, is it the devs or the ORM that's adding the ORDER and the LIMIT? I'm betting on devs. Do they need the smallest id (first occurrance?) or do they need data common to all 5096 entries (Name?) and any record will do?. For the former they might be better off asking for just the attributes they need and for the latter you need to provide an option which gets them that single record. Of course, If they have the "smallest id" in hand they should request that.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On Mon, 6 Dec 2021 at 18:03, Alan Hodgson wrote: ... > The table has nearly 29 million records. 5069 of them match > shipment_import_id = 5090609. There is an index on shipment_import_id, which > the planner happily uses without the LIMIT specifically. Yet with it the > query planner will always do something like: > > # explain SELECT "shipment_import_records".* FROM "shipment_import_records" > WHERE "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY > "shipment_import_records"."id" ASC LIMIT 1; > QUERY PLAN > - > Limit (cost=0.44..873.35 rows=1 width=243) > -> Index Scan using shipment_import_records_pkey on > shipment_import_records (cost=0.44..517.70 rows=5868 width=243) > Filter: (shipment_import_id = 5090609) > .. which takes minutes. Can you post an explain analyze? To me it seems like the planner thinks shipment_import_id is randomly distributed and the table is well correlated with it's PK, so scanning it for the first id should be fast. But from the names of the field you may have correlation between shipment_import_id and id hidden somewhere ( like they are two serial growing together, you query for the latest shipment ids and it scans all the table ). An explain analyze should show that ( or three, one for that shipment import id, one for 1, one for a really big one ) > Just wondering if there's a knob I can turn to make these more likely to work > without constantly implementing workarounds? You may try a composite index. Francisco Olarte.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On Mon, 2021-12-06 at 10:18 -0700, Michael Lewis wrote: > What is your default_statistics_target and how accurate is that > estimate of 5668 rows? What is random_page_cost set to by the way? > > > default_statistics_target = 1000 random_page_cost = 2.0 (it's on AWS on a 9000 iops gp2 volume) Postgresql 13.5 btw. The estimate was reasonably accurate, there were 5069 actual rows matching. > More importantly, what is the better plan that you'd like the planner > to use with your existing indexes? Well, it takes a few ms to grab all 5000 rows by shipment_import_id and then sort/limit them. It takes 30 seconds to do what it is doing instead, and only when the table is mostly cached already, more like 4-5 minutes otherwise. #explain analyze SELECT "shipment_import_records".* FROM shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; - - - Limit (cost=0.44..873.08 rows=1 width=243) (actual time=31689.725..31689.726 rows=1 loops=1) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5122405.71 rows=5870 width=243) (actual time=31689.723..31689.724 rows=1 loops=1) Filter: (shipment_import_id = 5090609) Rows Removed by Filter: 28710802 Planning Time: 0.994 ms Execution Time: 31689.744 ms (6 rows) Just with a kludge to force the better index: # explain analyze SELECT * FROM (SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5090609 OFFSET 0) AS x ORDER BY "id" ASC LIMIT 1; QUERY PLAN - - -- Limit (cost=10655.34..10655.34 rows=1 width=243) (actual time=4.868..4.869 rows=1 loops=1) -> Sort (cost=10655.34..10670.02 rows=5870 width=243) (actual time=4.867..4.868 rows=1 loops=1) Sort Key: shipment_import_records.id Sort Method: top-N heapsort Memory: 27kB -> Index Scan using index_shipment_import_records_on_shipment_import_id on shipment_import_records (cost=0.44..10567.29 rows=5870 width=243) (actual time=0.037..3.560 rows=5069 loops=1) Index Cond: (shipment_import_id = 5090609) Planning Time: 0.135 ms Execution Time: 4.885 ms (8 rows) > > Certainly a composite index would be very helpful here. Using explain > analyze and sharing the output would give more info to go on. > Yeah I am going to just do the composite index for now, but was hoping for a more generic option. Thanks for looking at it.
Re: CTE Materialization
Thanks a lot, completely forgot that one! Gonna test that tomorrow... On Thu, Dec 2, 2021 at 11:34 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thursday, December 2, 2021, Paul van der Linden < > paul.doskabou...@gmail.com> wrote: > >> Hi, >> >> when switching to postgres 14 (from 11) I'm having some slow queries >> because of inlining of CTE's. >> I know I can get the same result as with PG11 when adding MATERIALIZED to >> the cte, but the same application also needs to be able to run on older >> postgres versions, so that is a no-go. >> Is there any other way that I can have materialized cte's in PG14 while >> still be compatible with older PG versions? >> Much appreciated, >> > > The usual anti-inlining hack is to add an “offset 0” to the query. > Haven’t tried it in 14 myself though. > > David J. > >
Re: Require details that how to find user creation date in postgresql Database
Dear Tom, Thank you so much for sharing the information. Regards, M. Sonai Muthu Raja Managed Delivery Services - DBA Support Landline - 044 66235600, Extn - 1176 Mobile - +91 9003132734 From: Tom Lane Sent: Friday, December 3, 2021 3:51 AM To: Sonai muthu raja M Cc: pgsql-general@lists.postgresql.org Subject: Re: Require details that how to find user creation date in postgresql Database WARNING: This email originated from outside of Sify domain. DO NOT click the links or open the attachments unless you recognize the sender and know the content is safe. "Sonai muthu raja M" writes: > Kindly help us to provide the details that how to find User creation date in > postgresql Database. This information is not stored by Postgres, so you can't. regards, tom lane DISCLAIMER: The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. Please note that intercepting or any unauthorized use of this message or any attachment can be treated as infringement of person’s right of privacy under the Human Rights Act 1993 and also is a Criminal Offence under the Information and Technology Act, 2008. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. Thank you for your cooperation.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On Mon, 2021-12-06 at 18:20 +0100, Francisco Olarte wrote: > > Can you post an explain analyze? To me it seems like the planner > thinks shipment_import_id is randomly distributed and the table is > well correlated with it's PK, so scanning it for the first id > should > be fast. #explain analyze SELECT "shipment_import_records".* FROM shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; -- --- -- --- - Limit (cost=0.44..873.08 rows=1 width=243) (actual time=31689.725..31689.726 rows=1 loops=1) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5122405.71 rows=5870 width=243) (actual time=31689.723..31689.724 rows=1 loops=1) Filter: (shipment_import_id = 5090609) Rows Removed by Filter: 28710802 Planning Time: 0.994 ms Execution Time: 31689.744 ms (6 rows) The biggest one (but yes "earlier"): # explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 1247888 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; QUERY PLAN - - Limit (cost=0.44..426.59 rows=1 width=243) (actual time=8007.069..8007.070 rows=1 loops=1) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5126628.40 rows=12030 width=243) (actual time=8007.068..8007.068 rows=1 l oops=1) Filter: (shipment_import_id = 1247888) Rows Removed by Filter: 10929193 Planning Time: 0.584 ms Execution Time: 8007.086 ms (6 rows) And the smallest/latest, which actually uses the "right" index: # explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5116174 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; QUERY PLAN - - Limit (cost=145.44..145.44 rows=1 width=243) (actual time=0.018..0.018 rows=1 loops=1) -> Sort (cost=145.44..145.64 rows=79 width=243) (actual time=0.017..0.018 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 26kB -> Index Scan using index_shipment_import_records_on_shipment_import_id on shipment_import_records (cost=0.44..145.05 rows=79 width=243) (actual time=0.013 ..0.014 rows=1 loops=1) Index Cond: (shipment_import_id = 5116174) Planning Time: 0.104 ms Execution Time: 0.032 ms (8 rows) > > But from the names of the field you may have correlation between > shipment_import_id and id hidden somewhere ( like they are two > serial > growing together, you query for the latest shipment ids and it > scans > all the table ). An explain analyze should show that ( or three, > one > for that shipment import id, one for 1, one for a really big one ) This is definitely the case. And we are generally looking for newer data for most operations. Thanks for looking at it.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On Mon, 2021-12-06 at 10:19 -0700, Rob Sargent wrote: > To be clear, is it the devs or the ORM that's adding the ORDER and > the > LIMIT? I'm betting on devs. Do they need the smallest id (first > occurrance?) or do they need data common to all 5096 entries > (Name?) and > any record will do?. For the former they might be better off > asking for > just the attributes they need and for the latter you need to > provide an > option which gets them that single record. Of course, If they have > the > "smallest id" in hand they should request that. That assumes I could figure what bit of ORM code is generating this, talk to them, and then get them to actually think about what data they're looking for and it's impact on the database. :/ Given my 25 year track record with devs, I'm thinking of that as plan B. Hopefully though if they're looking for something common to all the records they would look at the parent table instead. I do expect the dev actually specified the order/limit for some reason. Thank you for the suggestions.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On 12/6/21 10:02, Alan Hodgson wrote: On Mon, 2021-12-06 at 10:19 -0700, Rob Sargent wrote: To be clear, is it the devs or the ORM that's adding the ORDER and the LIMIT? I'm betting on devs. Do they need the smallest id (first occurrance?) or do they need data common to all 5096 entries (Name?) and any record will do?. For the former they might be better off asking for just the attributes they need and for the latter you need to provide an option which gets them that single record. Of course, If they have the "smallest id" in hand they should request that. That assumes I could figure what bit of ORM code is generating this, talk to them, and then get them to actually think about what data they're looking for and it's impact on the database. :/ Given my 25 year track record with devs, I'm thinking of that as plan B. Hopefully though if they're looking for something common to all the records they would look at the parent table instead. I do expect the dev actually specified the order/limit for some reason. Maybe I'm silly, but why is asking them a Plan B? Thank you for the suggestions. -- Adrian Klaver adrian.kla...@aklaver.com
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
po 6. 12. 2021 v 18:21 odesílatel Francisco Olarte napsal: > On Mon, 6 Dec 2021 at 18:03, Alan Hodgson > wrote: > ... > > The table has nearly 29 million records. 5069 of them match > shipment_import_id = 5090609. There is an index on shipment_import_id, > which the planner happily uses without the LIMIT specifically. Yet with it > the query planner will always do something like: > > > > # explain SELECT "shipment_import_records".* FROM > "shipment_import_records" WHERE > "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY > "shipment_import_records"."id" ASC LIMIT 1; > > QUERY PLAN > > > - > > Limit (cost=0.44..873.35 rows=1 width=243) > > -> Index Scan using shipment_import_records_pkey on > shipment_import_records (cost=0.44..517.70 rows=5868 width=243) > > Filter: (shipment_import_id = 5090609) > > .. which takes minutes. > > Can you post an explain analyze? To me it seems like the planner > thinks shipment_import_id is randomly distributed and the table is > well correlated with it's PK, so scanning it for the first id should > be fast. > > But from the names of the field you may have correlation between > shipment_import_id and id hidden somewhere ( like they are two serial > growing together, you query for the latest shipment ids and it scans > all the table ). An explain analyze should show that ( or three, one > for that shipment import id, one for 1, one for a really big one ) > > > Just wondering if there's a knob I can turn to make these more likely to > work without constantly implementing workarounds? > > You may try a composite index. > +1 These issues can be solved by composite indexes. The low limit clause deforms costs and when the data are not really random, then index scan can be too long. > Francisco Olarte. > > >
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On 12/6/21 11:02 AM, Alan Hodgson wrote: On Mon, 2021-12-06 at 10:19 -0700, Rob Sargent wrote: To be clear, is it the devs or the ORM that's adding the ORDER and the LIMIT? I'm betting on devs. Do they need the smallest id (first occurrance?) or do they need data common to all 5096 entries (Name?) and any record will do?. For the former they might be better off asking for just the attributes they need and for the latter you need to provide an option which gets them that single record. Of course, If they have the "smallest id" in hand they should request that. That assumes I could figure what bit of ORM code is generating this, talk to them, and then get them to actually think about what data they're looking for and it's impact on the database. :/ Given my 25 year track record with devs, I'm thinking of that as plan B. Hopefully though if they're looking for something common to all the records they would look at the parent table instead. I do expect the dev actually specified the order/limit for some reason. Until you know what they're after it's hard to make the correct adjustment. Another index is more over-head and may not be necessary or even the best solution for getting that one tuple they're after. Thank you for the suggestions. Up thread you hoped for a general solution. The devs writing better queries is your best bet. If you cannot or choose not to talk with the devs or at least their manager you'll be chasing these for a long time. Those dealing with ORM code are server-side: they are supposed to be the smart ones! Maybe they can be taught.
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On Mon, 2021-12-06 at 19:22 +0100, Pavel Stehule wrote: > po 6. 12. 2021 v 18:21 odesílatel Francisco Olarte > napsal: > > On Mon, 6 Dec 2021 at 18:03, Alan Hodgson wrote: > > > # explain SELECT "shipment_import_records".* FROM > > > "shipment_import_records" WHERE > > > "shipment_import_records"."shipment_import_id" = 5090609 ORDER BY > > > "shipment_import_records"."id" ASC LIMIT 1; > > > QUERY PLAN > > > - > > > Limit (cost=0.44..873.35 rows=1 width=243) > > > -> Index Scan using shipment_import_records_pkey on > > > shipment_import_records (cost=0.44..517.70 rows=5868 width=243) > > > Filter: (shipment_import_id = 5090609) > > > .. which takes minutes. > > > > > > Just wondering if there's a knob I can turn to make these more likely to > > > work without constantly implementing workarounds? > > > > You may try a composite index. > > +1 These issues can be solved by composite indexes. The low limit clause > deforms costs and when the data are not really random, then index scan can be > too long. An ugly alternative is to use "ORDER BY id + 0", which prevents PostgreSQL from using the index. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT
On 12/6/21 22:16, Laurenz Albe wrote: An ugly alternative is to use "ORDER BY id + 0", which prevents PostgreSQL from using the index. That was actually the earliest form of Oracle hints. I remember doing exactly that in Oracle 5.1.22 on VAX/VMS. -- Mladen Gogala Database Consultant Tel: (347) 321-1217 https://dbwhisperer.wordpress.com
Re: Are Foreign Key Disabled During Logical Replication Initial Sync?
On 06.12.21 15:50, Avi Weinberg wrote: Does it mean that populating each table is done in a single transaction? If so, when I have tables with foreign keys between them, is it guaranteed that logical replication will populates the tables in the proper order so the foreign key will be enforced? Or maybe the foreign keys are disabled during initial sync and only after all tables are populated the FKs are enabled again. Foreign keys are disabled on logical replication subscribers.
Re: CTE Materialization
It did indeed work as expected. Took the query down from over 18 hours to 20 minutes, so a huge win! Paul On Thu, Dec 2, 2021 at 11:34 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Thursday, December 2, 2021, Paul van der Linden < > paul.doskabou...@gmail.com> wrote: > >> Hi, >> >> when switching to postgres 14 (from 11) I'm having some slow queries >> because of inlining of CTE's. >> I know I can get the same result as with PG11 when adding MATERIALIZED to >> the cte, but the same application also needs to be able to run on older >> postgres versions, so that is a no-go. >> Is there any other way that I can have materialized cte's in PG14 while >> still be compatible with older PG versions? >> Much appreciated, >> > > The usual anti-inlining hack is to add an “offset 0” to the query. > Haven’t tried it in 14 myself though. > > David J. > >
Re: Advice on using materialized views
On Mon, 6 Dec 2021 at 18:48, Phil Endecott wrote: > > - I have a raw data table, indexed on a timestamp column, to which > new data is regularly being added. > > - I have a set of views that produce hourly/daily/monthly summaries > of this data. Querying these views is slow, so I'd like to make > them materialized views. > > - But I'd like the most recent data to be included in the results > of my queries. So I think I need a combined view that is the > union of the materialized view and a non-materialised view for > the recent values. Assuming your table is insert-only: How about instead of using a materialized view at all, you define a table of hourly summaries which your script updates, and define a view which merges that with an on-the-fly summary of main table rows newer than the most recent summarised hour.
RE: Are Foreign Key Disabled During Logical Replication Initial Sync?
Hi Peter, Thanks for answer. Just to clarify, they are disabled during initial sync only or are always disabled on subscriber side? Are all triggers disabled during initial sync or just foreign keys? How can I know that initial sync completed for all tables? Is it checking when pg_subscription_rel.srsubstate is 'i' or 'd' for all tables or there is a better way? Thanks! -Original Message- From: Peter Eisentraut [mailto:peter.eisentr...@enterprisedb.com] Sent: Tuesday, December 7, 2021 7:19 AM To: Avi Weinberg ; pgsql-general@lists.postgresql.org Subject: Re: Are Foreign Key Disabled During Logical Replication Initial Sync? On 06.12.21 15:50, Avi Weinberg wrote: > Does it mean that populating each table is done in a single > transaction? If so, when I have tables with foreign keys between > them, is it guaranteed that logical replication will populates the > tables in the proper order so the foreign key will be enforced? Or > maybe the foreign keys are disabled during initial sync and only after > all tables are populated the FKs are enabled again. Foreign keys are disabled on logical replication subscribers. IMPORTANT - This email and any attachments is intended for the above named addressee(s), and may contain information which is confidential or privileged. If you are not the intended recipient, please inform the sender immediately and delete this email: you should not copy or use this e-mail for any purpose nor disclose its contents to any person.