Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
On Tue, 8 Sep 2020 at 06:05, Raj wrote: > > > This would not exactly look like a bug, because the message says "to > > be locked", so at least it's not allowing two workers to lock the same > > tuple. But it seems that the skip-locked mode should not make an error > > out of this, but treat it as the tuple was already locked. Why would > > it want to lock the tuple (representing the job) if another worker has > > already finished his UPDATE of the job to mark it as "done" (which is > > what makes the tuple move to the "completed" partition.) (It's not very clear who wrote the above text since the quote does not mention who the author is and the original email didn't appear to have made it to the list) It's not a bug. I think the quoted text is expecting a bit too much from the database. It does not know that if the tuple is updated and moved to another partition that it can be safely ignored. For all the database knows, the new version of the tuple that's in the new partition still matches the query's WHERE clause and should be locked. If we just go and ignore moved off tuples then we could miss processing tuples that still need to be processed. It's perhaps not impossible to make it work slightly better if it were somehow possible to inform heapam_tuple_lock() that it's operating on a partition and the query queried a partitioned table and that all but 1 partition was pruned with partition pruning. In this case we could be certain the new verison of the tuple can't match the WHERE clause of the SELECT since partition pruning determined that all other partitions don't match the WHERE clause. However, that's: a) a pretty horrid thing to have to teach heapam_tuple_lock() about, and; b) only going to work when 1 partition survives partition pruning, which is pretty horrible since doing ATTACH PARTITION could suddenly cause your queries to fail randomly. If you had 3 partitions, one for "pending", "retry" and "complete", and you wanted to lock all rows that are in a "pending" or "retry" state, then when we encounter an updated row in the "pending" partition, we have no knowledge if it was moved into the "retry" or the "completed" partition. If it's in "retry", then we do want to find it and process it, but if it's in "completed", then it does not match the WHERE clause of the query and we can ignore it. Since we don't know which, we can't make assumptions and must force the user to try again, hence the serialisation failure error. > > Either the SELECT for jobs to do returned a wrong tuple, which was > > already updated, or there is some lapse in the locking. > > > > Either way it would seem to be a waste of time throwing all these > > errors when the tuple should not even have been selected for update > > and locking. > > > > I wonder if anybody knows anything about that issue? Of course you'll > > want to see the DDL and SQL queries, etc. but you can't really try it > > out unless you do some massively parallel magic. I ready mentioned why this cannot work that way [1]. If you have some idea on how to make it work correctly, then it would be interesting to hear. Otherwise, I'm sorry to say that we can't just ignore these tuples because it happens to suit your use case. The solution is just to make the application retry on serialisation failures. David [1] https://www.postgresql.org/message-id/CAApHDvrDH6TQeLxTqnnAnhjrs55ru5g2_QMG=me+wvd5mmp...@mail.gmail.com
Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED
On 2020-Sep-14, David Rowley wrote: > On Tue, 8 Sep 2020 at 06:05, Raj wrote: > > > > > This would not exactly look like a bug, because the message says "to > > > be locked", so at least it's not allowing two workers to lock the same > > > tuple. But it seems that the skip-locked mode should not make an error > > > out of this, but treat it as the tuple was already locked. Why would > > > it want to lock the tuple (representing the job) if another worker has > > > already finished his UPDATE of the job to mark it as "done" (which is > > > what makes the tuple move to the "completed" partition.) > > (It's not very clear who wrote the above text since the quote does not > mention who the author is and the original email didn't appear to have > made it to the list) Same person. https://postgr.es/m/[email protected] -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Performance Issue (Not using Index when joining two tables).
Hi,
Good Morning!
Postgres Version : 11.6 (AWS Native Postgres/AWS Aurora tried on both
flavours).
When i'm joining two tables the primary index is not being used. While is use
in clause with values then the index is being used. I have reindexed all the
tables, run the auto vaccum as well.
pgwfc01q=> select count(*) from chr_simple_val;
count
---
13158
(1 row)
pgwfc01q=> select count(*) from chr_emp_position;
count
---
228
(1 row)
The primary key for the table chr_Simple_val contains OID. Still not using
the index.
I'm sharing the explain plan over here..
pgwfc01q=> explain analyze select
cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner
join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;
QUERY P
LAN
Hash Join (cost=49299.91..51848.83 rows=651 width=42) (actual
time=3512.692..3797.583 rows=228 loops=1)
Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
-> Seq Scan on chr_emp_position cep (cost=0.00..2437.77 rows=436 width=11)
(actual time=44.713..329.435 rows=22
8 loops=1)
Filter: ((("current_user"())::text <> ANY
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND
f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
Rows Removed by Filter: 3695
-> Hash (cost=49176.40..49176.40 rows=9881 width=31) (actual
time=3467.907..3467.908 rows=13158 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1031kB
-> Seq Scan on chr_simple_val ctc (cost=0.00..49176.40 rows=9881
width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
Filter: ((("current_user"())::text <> ANY
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying,
prod_locale_code))
Rows Removed by Filter: 75771
Planning Time: 0.297 ms
Execution Time: 3797.768 ms
(12 rows)
Thank you..
Regards,
Ramesh G
Re: Performance Issue (Not using Index when joining two tables).
On Sun, Sep 13, 2020 at 02:58:15PM +, Gopisetty, Ramesh wrote:
Hi,
Good Morning!
Postgres Version : 11.6 (AWS Native Postgres/AWS Aurora tried on both
flavours).
When i'm joining two tables the primary index is not being used. While is use
in clause with values then the index is being used. I have reindexed all the
tables, run the auto vaccum as well.
pgwfc01q=> select count(*) from chr_simple_val;
count
---
13158
(1 row)
pgwfc01q=> select count(*) from chr_emp_position;
count
---
228
(1 row)
The primary key for the table chr_Simple_val contains OID. Still not using
the index.
I'm sharing the explain plan over here..
pgwfc01q=> explain analyze select
cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner
join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;
QUERY P
LAN
Hash Join (cost=49299.91..51848.83 rows=651 width=42) (actual
time=3512.692..3797.583 rows=228 loops=1)
Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
-> Seq Scan on chr_emp_position cep (cost=0.00..2437.77 rows=436 width=11)
(actual time=44.713..329.435 rows=22
8 loops=1)
Filter: ((("current_user"())::text <> ANY
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND
f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
Rows Removed by Filter: 3695
-> Hash (cost=49176.40..49176.40 rows=9881 width=31) (actual
time=3467.907..3467.908 rows=13158 loops=1)
Buckets: 16384 Batches: 1 Memory Usage: 1031kB
-> Seq Scan on chr_simple_val ctc (cost=0.00..49176.40 rows=9881
width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
Filter: ((("current_user"())::text <> ANY
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying,
prod_locale_code))
Rows Removed by Filter: 75771
Planning Time: 0.297 ms
Execution Time: 3797.768 ms
(12 rows)
Most of the time (3460ms) is spent in the sequential scan on
chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms).
Combined that's 3790ms out of 3797ms, so the join is pretty much
irrelevant.
Either the seqscans are causing a lot of I/O, or maybe the f_sel_*
functions in the filter are expensive. Judging by how few rows are in
the tables (not sure how large the tables are), I'd guess it's the
latter ... Hard to say without knowing what the functions do etc.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Performance Issue (Not using Index when joining two tables).
Tomas Vondra writes: > Most of the time (3460ms) is spent in the sequential scan on > chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms). > Combined that's 3790ms out of 3797ms, so the join is pretty much > irrelevant. > Either the seqscans are causing a lot of I/O, or maybe the f_sel_* > functions in the filter are expensive. Judging by how few rows are in > the tables (not sure how large the tables are), I'd guess it's the > latter ... Hard to say without knowing what the functions do etc. I think the OP is wishing that the filter functions for the larger table would be postponed till after the join condition is applied. I'm a little dubious that that's going to save anything meaningful; but maybe increasing the cost attributed to those functions would persuade the planner to try it that way. First though, does forcing a nestloop plan (turn off enable_hashjoin, and enable_mergejoin too if needed) produce the shape of plan you want? And if so, is it actually faster? Only if those things are true is it going to be worth messing with costing parameters. regards, tom lane
Re: Performance Issue (Not using Index when joining two tables).
Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[]))
AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND
f_sel_policy_prod_locale((cep.*)::character
varying, prod_locale_code))
This looks like some stuff for row level security perhaps. My understanding
is limited, but perhaps those restrictions are influencing the planners
access or reliance on stats.
Also, it would seem like you need the entire table since you don't have an
explicit where clause. Why would scanning an index and then also visiting
every row in the table be faster than just going directly to the table?
