date:20200913

Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

2020-09-13 Thread David Rowley

On Tue, 8 Sep 2020 at 06:05, Raj  wrote:
>
> > This would not exactly look like a bug, because the message says "to
> > be locked", so at least it's not allowing two workers to lock the same
> > tuple. But it seems that the skip-locked mode should not make an error
> > out of this, but treat it as the tuple was already locked. Why would
> > it want to lock the tuple (representing the job) if another worker has
> > already finished his UPDATE of the job to mark it as "done" (which is
> > what makes the tuple move to the "completed" partition.)

(It's not very clear who wrote the above text since the quote does not
mention who the author is and the original email didn't appear to have
made it to the list)

It's not a bug. I think the quoted text is expecting a bit too much
from the database. It does not know that if the tuple is updated and
moved to another partition that it can be safely ignored.   For all
the database knows, the new version of the tuple that's in the new
partition still matches the query's WHERE clause and should be locked.
If we just go and ignore moved off tuples then we could miss
processing tuples that still need to be processed.

It's perhaps not impossible to make it work slightly better if it were
somehow possible to inform heapam_tuple_lock() that it's operating on
a partition and the query queried a partitioned table and that all but
1 partition was pruned with partition pruning.  In this case we could
be certain the new verison of the tuple can't match the WHERE clause
of the SELECT since partition pruning determined that all other
partitions don't match the WHERE clause. However, that's:

a) a pretty horrid thing to have to teach heapam_tuple_lock() about, and;
b) only going to work when 1 partition survives partition pruning,
which is pretty horrible since doing ATTACH PARTITION could suddenly
cause your queries to fail randomly.

If you had 3 partitions, one for "pending", "retry" and "complete",
and you wanted to lock all rows that are in a "pending" or "retry"
state, then when we encounter an updated row in the "pending"
partition, we have no knowledge if it was moved into the "retry" or
the "completed" partition.   If it's in "retry", then we do want to
find it and process it, but if it's in "completed", then it does not
match the WHERE clause of the query and we can ignore it.  Since we
don't know which, we can't make assumptions and must force the user to
try again, hence the serialisation failure error.

> > Either the SELECT for jobs to do returned a wrong tuple, which was
> > already updated, or there is some lapse in the locking.
> >
> > Either way it would seem to be a waste of time throwing all these
> > errors when the tuple should not even have been selected for update
> > and locking.
> >
> > I wonder if anybody knows anything about that issue? Of course you'll
> > want to see the DDL and SQL queries, etc. but you can't really try it
> > out unless you do some massively parallel magic.

I ready mentioned why this cannot work that way [1].  If you have some
idea on how to make it work correctly, then it would be interesting to
hear. Otherwise, I'm sorry to say that we can't just ignore these
tuples because it happens to suit your use case.

The solution is just to make the application retry on serialisation failures.

David

[1] 
https://www.postgresql.org/message-id/CAApHDvrDH6TQeLxTqnnAnhjrs55ru5g2_QMG=me+wvd5mmp...@mail.gmail.com

Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

2020-09-13 Thread Alvaro Herrera

On 2020-Sep-14, David Rowley wrote:

> On Tue, 8 Sep 2020 at 06:05, Raj  wrote:
> >
> > > This would not exactly look like a bug, because the message says "to
> > > be locked", so at least it's not allowing two workers to lock the same
> > > tuple. But it seems that the skip-locked mode should not make an error
> > > out of this, but treat it as the tuple was already locked. Why would
> > > it want to lock the tuple (representing the job) if another worker has
> > > already finished his UPDATE of the job to mark it as "done" (which is
> > > what makes the tuple move to the "completed" partition.)
> 
> (It's not very clear who wrote the above text since the quote does not
> mention who the author is and the original email didn't appear to have
> made it to the list)

Same person.
https://postgr.es/m/[email protected]

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Performance Issue (Not using Index when joining two tables).

2020-09-13 Thread Gopisetty, Ramesh

Hi,

Good Morning!

Postgres Version :  11.6  (AWS Native Postgres/AWS Aurora  tried on both 
flavours).

When i'm joining two tables the primary index is not being used.  While is use  
in clause with values then the index is being used.  I have reindexed all the 
tables,  run the auto vaccum as well.


pgwfc01q=> select count(*) from chr_simple_val;
 count
---
 13158
(1 row)

pgwfc01q=> select count(*) from chr_emp_position;
 count
---
   228
(1 row)


The primary key for the table chr_Simple_val  contains OID.   Still not using 
the index.

I'm sharing the explain plan over here..

pgwfc01q=> explain analyze select 
cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner 
join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;

 QUERY P
LAN


 Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual 
time=3512.692..3797.583 rows=228 loops=1)
   Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
   ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) 
(actual time=44.713..329.435 rows=22
8 loops=1)
 Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND 
f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
 Rows Removed by Filter: 3695
   ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual 
time=3467.907..3467.908 rows=13158 loops=1)
 Buckets: 16384  Batches: 1  Memory Usage: 1031kB
 ->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 
width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
   Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, 
prod_locale_code))
   Rows Removed by Filter: 75771
 Planning Time: 0.297 ms
 Execution Time: 3797.768 ms
(12 rows)


Thank you..

Regards,
Ramesh G

Re: Performance Issue (Not using Index when joining two tables).

2020-09-13 Thread Tomas Vondra


On Sun, Sep 13, 2020 at 02:58:15PM +, Gopisetty, Ramesh wrote:

Hi,

Good Morning!

Postgres Version :  11.6  (AWS Native Postgres/AWS Aurora  tried on both 
flavours).

When i'm joining two tables the primary index is not being used.  While is use  
in clause with values then the index is being used.  I have reindexed all the 
tables,  run the auto vaccum as well.


pgwfc01q=> select count(*) from chr_simple_val;
count
---
13158
(1 row)

pgwfc01q=> select count(*) from chr_emp_position;
count
---
  228
(1 row)


The primary key for the table chr_Simple_val  contains OID.   Still not using 
the index.

I'm sharing the explain plan over here..

pgwfc01q=> explain analyze select 
cep.HOME_DEPT_OID,ctc.oid,ctc.category,ctc.code from chr_emp_position cep inner 
join chr_Simple_Val ctc on ctc.oid=cep.HOME_DEPT_OID;

QUERY P
LAN


Hash Join  (cost=49299.91..51848.83 rows=651 width=42) (actual 
time=3512.692..3797.583 rows=228 loops=1)
  Hash Cond: ((cep.home_dept_oid)::text = (ctc.oid)::text)
  ->  Seq Scan on chr_emp_position cep  (cost=0.00..2437.77 rows=436 width=11) 
(actual time=44.713..329.435 rows=22
8 loops=1)
Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND 
f_sel_policy_prod_locale((cep.*)::character varying, prod_locale_code))
Rows Removed by Filter: 3695
  ->  Hash  (cost=49176.40..49176.40 rows=9881 width=31) (actual 
time=3467.907..3467.908 rows=13158 loops=1)
Buckets: 16384  Batches: 1  Memory Usage: 1031kB
->  Seq Scan on chr_simple_val ctc  (cost=0.00..49176.40 rows=9881 
width=31) (actual time=2.191..3460.929 r
ows=13158 loops=1)
  Filter: ((("current_user"())::text <> ANY 
('{wfnadmin,skipvpd}'::text[])) AND f_sel_policy_ty_static(
vpd_key) AND f_sel_policy_prod_locale((ctc.*)::character varying, 
prod_locale_code))
  Rows Removed by Filter: 75771
Planning Time: 0.297 ms
Execution Time: 3797.768 ms
(12 rows)



Most of the time (3460ms) is spent in the sequential scan on
chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms).
Combined that's 3790ms out of 3797ms, so the join is pretty much
irrelevant.

Either the seqscans are causing a lot of I/O, or maybe the f_sel_*
functions in the filter are expensive. Judging by how few rows are in
the tables (not sure how large the tables are), I'd guess it's the
latter ... Hard to say without knowing what the functions do etc.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Performance Issue (Not using Index when joining two tables).

2020-09-13 Thread Tom Lane

Tomas Vondra  writes:
> Most of the time (3460ms) is spent in the sequential scan on
> chr_simple_val, and the seqscan on chr_emp_position is taking ~330ms).
> Combined that's 3790ms out of 3797ms, so the join is pretty much
> irrelevant.

> Either the seqscans are causing a lot of I/O, or maybe the f_sel_*
> functions in the filter are expensive. Judging by how few rows are in
> the tables (not sure how large the tables are), I'd guess it's the
> latter ... Hard to say without knowing what the functions do etc.

I think the OP is wishing that the filter functions for the larger table
would be postponed till after the join condition is applied.  I'm a
little dubious that that's going to save anything meaningful; but maybe
increasing the cost attributed to those functions would persuade the
planner to try it that way.

First though, does forcing a nestloop plan (turn off enable_hashjoin,
and enable_mergejoin too if needed) produce the shape of plan you
want?  And if so, is it actually faster?  Only if those things are
true is it going to be worth messing with costing parameters.

regards, tom lane

Re: Performance Issue (Not using Index when joining two tables).

2020-09-13 Thread Michael Lewis

Filter: ((("current_user"())::text <> ANY ('{wfnadmin,skipvpd}'::text[]))
AND f_sel_policy_all(vpd_key, 'CH
R_EMP_POSITION'::character varying) AND
f_sel_policy_prod_locale((cep.*)::character
varying, prod_locale_code))

This looks like some stuff for row level security perhaps. My understanding
is limited, but perhaps those restrictions are influencing the planners
access or reliance on stats.

Also, it would seem like you need the entire table since you don't have an
explicit where clause. Why would scanning an index and then also visiting
every row in the table be faster than just going directly to the table?

Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Performance Issue (Not using Index when joining two tables).

Re: Performance Issue (Not using Index when joining two tables).

Re: Performance Issue (Not using Index when joining two tables).

Re: Performance Issue (Not using Index when joining two tables).

6 matches

Site Navigation

Mail list logo

Footer information