Re: Long running query causing XID limit breach
On Sat, May 25, 2024 at 11:00 PM sud wrote: > > But i have one question here , does max_standby_streaming_delay = 14 , > means the queries on the standby will get cancelled after 14 seconds? > No, your query gets cancelled when it stalls replication for >14 sec. If your master is idle and does not send any WAL and the replica has caught up, the query can take as long as it wants.
Re: Long running query causing XID limit breach
On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch wrote: > On Sat, May 25, 2024 at 11:00 PM sud wrote: > >> >> But i have one question here , does max_standby_streaming_delay = 14 , >> means the queries on the standby will get cancelled after 14 seconds? >> > > No, your query gets cancelled when it stalls replication for >14 sec. If > your master is idle and does not send any WAL and the replica has > caught up, the query can take as long as it wants. > Thank you so much. For example , in below scenario, if i have insert query going on on primary instance on table 25th may partition of TABLE1, and at same time we are selecting data from 24th May partition , then with "max_standby_streaming_delay = 14" setup , it just allows the select query to run for any duration without any restriction even if the WAL gets applied on the standby regularly. Also INSERT query in primary won't make the standby SELECT queries to cancel as because the WAL record of INSERT queries on the primary instance is not conflicting to the exact rows those were being read by the standby. Is my understanding correct here? However, if i have Update/Delete query going on on primary instance on table 25th may partition of TABLE1 and on the exact same set of rows which were being read by the standby instance by the SELECT query, then the application of such WAL record to standby can max wait for 14 seconds and thus those select query are prone to be cancelled after 14 seconds. Is this understanding correct? If the above is true then it doesn't look good, as because in an OLTP system there will be a lot of DMLS happening on the writer instances and there may be many queries running on the reader/standby instances which are meant to run for hours. And if we say making those SELECT queries run for hours means compromising an hour of "high availability"/RPO or a lag of an hour between primary and standby , that doesn't look good. Please correct me if I am missing something here.
scalar plpgsql functions and their stability flags
Good day experts... Question on scalar plpgsql functions stability flags (immutable, stable) regarding how it works in sql queries. It is clear that for immutable/stable functions with constant parameters, query planner could/should calculate value in a parse time and use it directly in query, or at least once per query. But it is unclear for me what exactly should/can happens when parameters are bounded not to constant values but to query fields. In such a case there could be some caching mechanics involved for parameters combinations and result values. Like building a hash table for that or something similar. Can someone give me guidance on this matter. What limits the usefulness of such a mechanism, if it exists. Thank you.
Re: scalar plpgsql functions and their stability flags
Victor Dobrovolsky writes: > It is clear that for immutable/stable functions with constant parameters, > query planner could/should calculate value in a parse time and use it > directly in query, or at least once per query. Immutable, yes, stable, no. Awhile back there was a draft patch to cache outputs of stable functions after running them once in a query, but I don't think anyone's still working on that. IIRC we were having a hard time convincing ourselves that the extra bookkeeping would pay for itself. > But it is unclear for me what exactly should/can happens when parameters > are bounded not to constant values but to query fields. > In such a case there could be some caching mechanics involved for > parameters combinations and result values. > Like building a hash table for that or something similar. No such mechanism exists. Again, there would be a lot of tradeoffs involved and it's difficult to say if it'd be a win. regards, tom lane
Re: Long running query causing XID limit breach
On 5/25/24 22:56, David HJ wrote: anyone know how to describe from this mailing list? See here: https://lists.postgresql.org/unsubscribe/ -- Adrian Klaver adrian.kla...@aklaver.com
Re: Long running query causing XID limit breach
On Sun, May 26, 2024 at 11:16 AM sud wrote: > On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch > wrote: > >> On Sat, May 25, 2024 at 11:00 PM sud wrote: >> >>> >>> But i have one question here , does max_standby_streaming_delay = 14 , >>> means the queries on the standby will get cancelled after 14 seconds? >>> >> >> No, your query gets cancelled when it stalls replication for >14 sec. If >> your master is idle and does not send any WAL and the replica has >> caught up, the query can take as long as it wants. >> > > Thank you so much. > For example , in below scenario, > if i have insert query going on on primary instance on table 25th may > partition of TABLE1, and at same time we are selecting data from 24th May > partition , then with "max_standby_streaming_delay = 14" setup , it just > allows the select query to run for any duration without any restriction > even if the WAL gets applied on the standby regularly. Also INSERT query in > primary won't make the standby SELECT queries to cancel as because the WAL > record of INSERT queries on the primary instance is not conflicting to the > exact rows those were being read by the standby. Is my understanding > correct here? > > However, if i have Update/Delete query going on on primary instance on > table 25th may partition of TABLE1 and on the exact same set of rows which > were being read by the standby instance by the SELECT query, then the > application of such WAL record to standby can max wait for 14 seconds and > thus those select query are prone to be cancelled after 14 seconds. Is this > understanding correct? > > If the above is true then it doesn't look good, as because in an OLTP > system there will be a lot of DMLS happening on the writer instances and > there may be many queries running on the reader/standby instances which are > meant to run for hours. And if we say making those SELECT queries run for > hours means compromising an hour of "high availability"/RPO or a lag of an > hour between primary and standby , that doesn't look good. Please > correct me if I am missing something here. > Each query on the replica has a backend_xmin. You can see that in pg_stat_activity. From that backend's perspective, tuples marked as deleted by any transaction greater or equal to backend_xmin are still needed. This does not depend on the table. Now, vacuum writes to the WAL up to which point it has vacuumed on the master. In pg_waldump this looks like so: PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel 1663/5/16430 blk 0 That snapshotConflictHorizon is also a transaction id. If the backend_xmin of all backends running transactions in the same database (the 5 in 16 63/5/16430) -as the vacuum WAL record is greater than vacuum's snapshotConflictHorizon, then there is no conflict. If any of the backend_xmin's is less, then there is a conflict. This type of conflict is determined by just 2 numbers, the conflict horizon sent by the master in the WAL, and the minimum of all backend_xmins. For your case this means a long running transaction querying table t1 might have a backend_xmin of 223. On the master update and delete operations happen on table T2. Since all the transactions on the master are fast, when vacuum hits T2, the minimum of all backend_xmins on the master might already be 425. So, garbage left over by all transactions up to 424 can be cleaned up. Now that cleanup record reaches the replica. It compares 223>425 which is false. So, there is a conflict. Now the replica can wait until its own horizon reaches 425 or it can kill all backends with a lower backend_xmin. As I understand, hot_standby_feedback does not work for you. Not sure if you can run the query on the master? That would resolve the issues but might generate the same bloat on the master as hot_standby_feedback. Another option I can see is to run long running queries on a dedicated replica with max_standby_streaming_delay set to infinity or something large enough. If you go that way, you could also fetch the WAL from your WAL archive instead of replicating from the master. That way the replica has absolutely no chance to affect the master. Good Luck! Torsten
Re: Long running query causing XID limit breach
On Sun, May 26, 2024 at 11:18 PM Torsten Förtsch wrote: > Each query on the replica has a backend_xmin. You can see that in > pg_stat_activity. From that backend's perspective, tuples marked as deleted > by any transaction greater or equal to backend_xmin are still needed. This > does not depend on the table. > > Now, vacuum writes to the WAL up to which point it has vacuumed on the > master. In pg_waldump this looks like so: > > PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, > redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel > 1663/5/16430 blk 0 > > That snapshotConflictHorizon is also a transaction id. If the backend_xmin > of all backends running transactions in the same database (the 5 in 16 > 63/5/16430) -as the vacuum WAL record is greater than vacuum's > snapshotConflictHorizon, then there is no conflict. If any of the > backend_xmin's is less, then there is a conflict. > > This type of conflict is determined by just 2 numbers, the conflict > horizon sent by the master in the WAL, and the minimum of all > backend_xmins. For your case this means a long running transaction querying > table t1 might have a backend_xmin of 223. On the master update and delete > operations happen on table T2. Since all the transactions on the master are > fast, when vacuum hits T2, the minimum of all backend_xmins on the master > might already be 425. So, garbage left over by all transactions up to 424 > can be cleaned up. Now that cleanup record reaches the replica. It compares > 223>425 which is false. So, there is a conflict. Now the replica can wait > until its own horizon reaches 425 or it can kill all backends with a lower > backend_xmin. > > As I understand, hot_standby_feedback does not work for you. Not sure if > you can run the query on the master? That would resolve the issues but > might generate the same bloat on the master as hot_standby_feedback. > Another option I can see is to run long running queries on a dedicated > replica with max_standby_streaming_delay set to infinity or something large > enough. If you go that way, you could also fetch the WAL from your > WAL archive instead of replicating from the master. That way the replica > has absolutely no chance to affect the master. > > Thank you so much. Would you agree that we should have two standby, one with default max_standby_streaming_delay (say 10 sec ) which will be mainly used as high availability and thus will be having minimal lag. and another standby with max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the SELECT queries to finish without caring about the lag, which will be utilized for the long running SELECT queries. And keep the hot_standby_feedback as ON for the first standby which is used as HA/high availability. And keep the hot_standby_feedback as OFF for the second standby which is utilized for long running SELECT queries, so that primary won't be waiting for the response/feedback from this standby to vacuum its old transactions and that will keep the transaction id wrap around issue from not happening because of the Read/Select queries on any of the standby.
Re: Long running query causing XID limit breach
On Sun, May 26, 2024 at 8:46 PM sud wrote: > Would you agree that we should have two standby, one with default > max_standby_streaming_delay (say 10 sec ) which will be mainly used as high > availability and thus will be having minimal lag. and another standby with > max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the > SELECT queries to finish without caring about the lag, which will be > utilized for the long running SELECT queries. > > And keep the hot_standby_feedback as ON for the first standby which is > used as HA/high availability. And keep the hot_standby_feedback as OFF for > the second standby which is utilized for long running SELECT queries, so > that primary won't be waiting for the response/feedback from this standby > to vacuum its old transactions and that will keep the transaction id wrap > around issue from not happening because of the Read/Select queries on any > of the standby. > Sure. That could work. Perhaps also set statement_timeout on the first replica, just in case.
Re: Long running query causing XID limit breach
On Mon, May 27, 2024 at 12:55 AM Torsten Förtsch wrote: > On Sun, May 26, 2024 at 8:46 PM sud wrote: > >> Would you agree that we should have two standby, one with default >> max_standby_streaming_delay (say 10 sec ) which will be mainly used as high >> availability and thus will be having minimal lag. and another standby with >> max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the >> SELECT queries to finish without caring about the lag, which will be >> utilized for the long running SELECT queries. >> >> And keep the hot_standby_feedback as ON for the first standby which is >> used as HA/high availability. And keep the hot_standby_feedback as OFF for >> the second standby which is utilized for long running SELECT queries, so >> that primary won't be waiting for the response/feedback from this standby >> to vacuum its old transactions and that will keep the transaction id wrap >> around issue from not happening because of the Read/Select queries on any >> of the standby. >> > > Sure. That could work. Perhaps also set statement_timeout on the first > replica, just in case. > Thank you so much. Yes, planning to set it like below. Hope i am doing it correctly. Master/Primary First Replica/Standby for High Availability Second Replica for Reporting hot_standby_feedback=ON hot_standby_feedback=ON hot_standby_feedback=OFF max_standby_streaming_delay=10 sec max_standby_streaming_delay=10 sec max_standby_streaming_delay=-1 (Infinite) statement_timeout = "2hrs" statement_timeout="2hrs" No statement_timeout i.e. infinite idle_in_transaction_session_timeout=10minutes idle_in_transaction_session_timeout=10minutes No idle_in_transaction_session_timeout i.e. infinite autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0