SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Enrico Thierbach

Hello,

I am using `SELECT * FROM queue ... FOR UPDATE SKIP LOCKED` to implement 
a queueing system.


Now I wonder if it is possible, given the id of one of the locked rows 
in the queue table, to find out which connection/which transaction owns 
the lock.


Any help is greatly appreciated.

Thank you,
/eno

--
me at github: https://github.com/radiospiel
me at linked.in: https://www.linkedin.com/in/radiospiel


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Enrico Thierbach

Hi Melvin, hi everyone else,

thank you for your support, and for your query example. And oh yes, I 
forgot to mention the postgres version, which is 9.6; but if I find a 
solution which works in Version 10 then I could probably update.


I guess with your query I can figure out which connection holds a lock, 
but it seems I cannot correlate those locks to the rows which actually 
are locked, since `pg_locks` seems not to reference this in any way.


To be more explicit: I can find out about all locks in the current 
database that are held by other connections using


```
select l.* from pg_locks l
left join pg_database d on l.database=d.oid
where pid <> pg_backend_pid()
  and d.datname = current_database()
  and relation::regclass=''::regclass;
```

which, with one locked row, results in something like this:

```
locktype | database | relation | page | tuple | virtualxid | 
transactionid | classid | objid | objsubid | virtualtransaction |  pid  
| mode | granted | fastpath

--+--+--+--+---++---+-+---+--++---+--+-+--
 relation |   629976 |   638971 |  |   ||   
| |   |  | 3/983554   | 60515 | 
RowShareLock | t   | t

(1 row)
```

And here is where I am stuck.

`database`, `relation` and `pid` are accounted for - the only value I 
can't make sense of is the `virtualtransaction` entry.


I was hoping that objid or objsubid would contain the OID of the locked 
row, but obviously I miss a crucial piece of understanding :)

(Note that I tried this both WITH OID and without oid in my table.)

Best,
/eno



--
me at github: https://github.com/radiospiel
me at linked.in: https://www.linkedin.com/in/radiospiel

On 15 Mar 2018, at 22:12, Melvin Davidson wrote:

On Thu, Mar 15, 2018 at 4:48 PM, Stephen Frost  
wrote:



Greetings,

* Enrico Thierbach (e...@open-lab.org) wrote:
I am using `SELECT * FROM queue ... FOR UPDATE SKIP LOCKED` to 
implement

a

queueing system.

Now I wonder if it is possible, given the id of one of the locked 
rows in
the queue table, to find out which connection/which transaction owns 
the

lock.


Sure, you can attempt to lock the record and then run 
pg_blocking_pids()

(in another session) against the pid which is trying to acquire the
lock.

Session #1:

Connect
SELECT * FROM queue ... FOR UPDATE SKIP LOCKED;
... gets back some id X
... waits

Session #2:

Connect
SELECT pg_backend_pid(); -- save this for the 3rd session
SELECT * FROM queue WHERE id = X FOR UPDATE;
... get blocked waiting for #1
... waits

Session #3:

SELECT pg_blocking_pids(SESSION_2_PID);
-- returns PID of Session #1

Obviously there's race conditions and whatnot (what happens if 
session

#1 releases the lock?), but that should work to figure out who is
blocking who.

If you're on a version of PG without pg_blocking_pids then you can 
look

in the pg_locks view, though that's a bit more annoying to decipher.

Thanks!

Stephen



Now I wonder if it is possible, given the id of one of the locked 
rows in

the queue table, to find out which connection/which transaction owns


You have not specified which version of PostgreSQL, but try this 
query.


SELECT c.datname,
   c.pid as pid,
   c.client_addr,
   c.usename as user,
   c.query,
   c.wait_event,
   c.wait_event_type,
/*   CASE WHEN c.waiting = TRUE
THEN 'BLOCKED'
ELSE 'no'
END as waiting,
*/
  l.pid as blocked_by,
   c.query_start,
   current_timestamp - c.query_start as duration
  FROM pg_stat_activity c
  LEFT JOIN pg_locks l1 ON (c.pid = l1.pid and not l1.granted)
  LEFT JOIN pg_locks l2 on (l1.relation = l2.relation and l2.granted)
  LEFT JOIN pg_stat_activity l ON (l2.pid = l.pid)
  LEFT JOIN pg_stat_user_tables t ON (l1.relation = t.relid)
 WHERE pg_backend_pid() <> c.pid
ORDER BY datname,
 query_start;

--
*Melvin Davidson*
*Maj. Database & Exploration Specialist*
*Universe Exploration Command – UXC*
Employment by invitation only!





Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-16 Thread Enrico Thierbach

Hi Melvin, Stephen, hi list,

*FWIW, I really don't understand your need to identify the actual rows 
that

are locked. Once you have identified the query that is causing a block
(which is usually due to "Idle in Transaction"), AFAIK the only way to
remedy the problem is to kill the offending query, or wait for it to
complete. I am not aware of any way available to a user to "unlock"
individual rows". Indeed, if you could, it would probably lead to
corruption of some form.*


The goal is to run a job queue, with a potentially largish number of 
workers that feed of the queue. So it would be useful to find out which 
queue entry is being processed right now (I can easily find out: when a 
row cannot be read via SKIP UNLOCKED it is locked, and probably being 
worked upon.) It would also be great to understand which worker holds 
the lock. The intention is NOT to kill the worker or its query.


With what the conversation brought up here (still trying to catch up 
with everything) I can:


1) determine all workers that currently are holding a lock (via 
Melvin’s);
2) on an individual base try to lock the row in a second connection and 
use a third connection to figure out which worker connection holds a 
lock on a specific single row (via Stephen’s).


This is probably good enough to cover the necessary basic functionality, 
so thank you for your input.


Am I correct to assume that there is no other way to determine who is 
holding a lock on a specific row and/or determine this for many rows in 
one go?


(I guess I am also correct to assume that whatever the worker is doing 
there is no way to somehow write this information into the database 
**via the same connection**. (Using a second connection would be 
obviously easy)


Best,
/eno

--
me at github: https://github.com/radiospiel
me at linked.in: https://www.linkedin.com/in/radiospiel



Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-16 Thread Enrico Thierbach

Thanks Steven,

Evidently my second email got lost somewhere along the way- what 
you're

looking for is an extension called 'pgrowlocks':

https://www.postgresql.org/docs/10/static/pgrowlocks.html

My prior email on that subject is here:

https://www.postgresql.org/message-id/20180315220512.GV2416%40tamriel.snowman.net



I’ll look into that as soon after next week.

Best,
/eno



Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-20 Thread Enrico Thierbach




--
me at github: https://github.com/radiospiel
me at linked.in: https://www.linkedin.com/in/radiospiel

​You probably considered this but the queuing mechanism I use 
doesn't hold
locks on records during processing.  Workers claim tasks by locking 
them,

setting a claimed flag of some sort, the releasing the lock (including
worker identity if desired) - repeating the general procedure once
completed.

My volume is such that the bloat the extra update causes is not 
meaningful

and is easily handled by (auto-)vacuum​.

David J.


Hi David,

well, I though about it and then put it to rest initially, since I liked 
the idea that with a running job kept “inside” the transaction I 
would never have zombie entries there: if somehow the network connection 
gets lost for the client machines the database would just rollback the 
transaction, the job would revert to its “ready-to-run” state, and 
the next worker would pick it up.


However, I will probably reconsider this, because it has quite some 
advantages; setting a “processing” state, let alone keeping a worker 
identitiy next to the job seems much more straightforward.


So how do you solve the “zombie” situation?

Best,
Enrico




Trouble matching a nested value in JSONB entries

2018-06-20 Thread Enrico Thierbach

Hi list,

I have some trouble matching a value in a JSONB object against multiple 
potential matches.


Lets say, I have a table with an id, and a metadata JSONB column, which 
holds data like the following


1 | {"group_id": 1}
2 | {“group_id": 1}
3 | {“group_id": 2}
4 | {“group_id": 3}

I would like to run a query which gives me the result of `SELECT id FROM 
mytable WHERE metadata->>’group_id’ IN (1,2)`. Now, obviously I 
could use this query, but I
would like to get away without an explicit index on 
`metadata->>’group_id’`, and I was hoping to find something using 
the JSONB containment operators, with support

of a gist or gin index.

The following seems to work

select * from mytable where (metadata @> '{"group_id":1}')

but only with a single value to match.

I could, of course, also “denormalize” the query a la

select * from mytable where (metadata @> '{"group_id":1}') OR 
(metadata @> '{"group_id”:2}’)


but this seems to call for long execution times; also, depending on the 
number of different tag names and values to match this could really 
explode into quite a

large query.

Stackoverflow suggests the use of ANY

select * from mytable where (tags->'group_id' @> ANY( ARRAY 
['1','3']::jsonb[] ) );


https://dba.stackexchange.com/questions/130699/postgresql-json-query-array-against-multiple-values

This seems to work - but doesn’t that require a group_id specific 
index again?


Anything I overlooked?

Best,
/eno

PS: Please note that I am currently at postgres 9.5. An update, if 
necessary, would be possible though.


--
me at github: https://github.com/radiospiel
me at linked.in: https://www.linkedin.com/in/radiospiel


Re: Trouble matching a nested value in JSONB entries

2018-07-03 Thread Enrico Thierbach

Oleg,


1 | {"group_id": 1}
2 | {“group_id": 1}
3 | {“group_id": 2}
4 | {“group_id": 3}




PS: Please note that I am currently at postgres 9.5. An update, if
necessary, would be possible though.


Upgrade, please !

I have only master 11beta2 right now:

select * from qq  where js @> '{"group_id":1}';
 id |   js
+-
  1 | {"group_id": 1}
  2 | {"group_id": 1}
(2 rows)



thanks for your answer. Your code does work fine on 9.5 already; what I 
would like to achieve is to get records where the group_id entry is 1 or 
2, and a fear that


select * from mytable where (metadata @> '{"group_id":1}') OR 
(metadata @> '{"group_id”:2}’)


- while doable - would not make the best use of existing indices.

Any more ideas?
Best,
/eno




partitions vs indexes

2019-10-02 Thread Enrico Thierbach

Hello list,

I run into some trouble with partitions:

I would like to convert a table with a primary key into a partitioned 
setup by a column which is not part of the primary key. Also, a column 
might hold a referenece to a parent row. So this is my current table 
setup, slimmed down:



CREATE TYPE statuses AS ENUM ('ready', ‘processing’, 
‘done’);


CREATE TABLE mytable (
  id  BIGSERIAL PRIMARY KEY NOT NULL,
  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,
  status statuses DEFAULT 'ready'
);

Since entries in the table are often frequented when status is not 
‘done’ I would like to partition by state. However, if I want to do 
that:


CREATE TABLE mytable (
  id  BIGSERIAL NOT NULL,
  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,
  status statuses DEFAULT 'ready'

  -- UNIQUE(id, status)-- doesn’t work: can’t reference 
parent

  -- UNIQUE(id)-- doesn’t work: can’t partition
) PARTITION BY LIST(status);

I need to add status to the primary key or another unique constraint. In 
that case, however, I can no longer have the foreign key constraint on 
parent_id.


Weirdly enough partitioning works fine if there is no unique constraint 
on that table:


CREATE TABLE mytable (
  id  BIGSERIAL NOT NULL,
  status statuses DEFAULT 'ready'
) PARTITION BY LIST(status);


So partitioning seems to require the column being in a unique constraint 
if and only if a unique constraint exist on the table. Also I cannot 
create multiple unique constraints on the table.


Here comes my question:

- Do I miss something?
- ThI don’t understand the requirement the partition value to be part 
of a unique constraint if such a constraint exists, since partitioning 
seems to work fine if the table has no unique constraints at all. Can 
someone shed some light on that? Is that maybe an artificial limitation 
that will go away on the future?

- Any suggestions how I could proceed?

Thank you for any suggestion!

Best,
Eno

--
me on github: http://github.com/radiospiel


Re: partitions vs indexes

2019-10-02 Thread Enrico Thierbach

On 2 Oct 2019, at 22:09, Enrico Thierbach wrote:


Hello list,

I run into some trouble with partitions:

I would like to convert a table with a primary key into a partitioned 
setup by a column which is not part of the primary key. Also, a column 
might hold a referenece to a parent row. So this is my current table 
setup, slimmed down:



CREATE TYPE statuses AS ENUM ('ready', ‘processing’, 
‘done’);


CREATE TABLE mytable (
  id  BIGSERIAL PRIMARY KEY NOT NULL,
  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,
  status statuses DEFAULT 'ready'
);

Since entries in the table are often frequented when status is not 
‘done’ I would like to partition by state. However, if I want to 
do that:


CREATE TABLE mytable (
  id  BIGSERIAL NOT NULL,
  parent_id   BIGINT REFERENCES mytable(id) ON DELETE CASCADE,
  status statuses DEFAULT 'ready'

  -- UNIQUE(id, status)-- doesn’t work: can’t 
reference parent
  -- UNIQUE(id)-- doesn’t work: can’t 
partition

) PARTITION BY LIST(status);

I need to add status to the primary key or another unique constraint. 
In that case, however, I can no longer have the foreign key constraint 
on parent_id.


Weirdly enough partitioning works fine if there is no unique 
constraint on that table:


CREATE TABLE mytable (
  id  BIGSERIAL NOT NULL,
  status statuses DEFAULT 'ready'
) PARTITION BY LIST(status);


So partitioning seems to require the column being in a unique 
constraint if and only if a unique constraint exist on the table. Also 
I cannot create multiple unique constraints on the table.


Here comes my question:

- Do I miss something?
- ThI don’t understand the requirement the partition value to be 
part of a unique constraint if such a constraint exists, since 
partitioning seems to work fine if the table has no unique constraints 
at all. Can someone shed some light on that? Is that maybe an 
artificial limitation that will go away on the future?

- Any suggestions how I could proceed?

Thank you for any suggestion!

Best,
Eno

--
me on github: http://github.com/radiospiel


and, errm, forgot to mention thatI am on postgresql 11.3. Sorry for that 
omission.


Best,
eno

--
me on github: http://github.com/radiospiel


Re: partitions vs indexes

2019-10-03 Thread Enrico Thierbach

On 2 Oct 2019, at 22:16, Michael Lewis wrote:


"I would like to convert a table with a primary key into a partitioned
setup by a column which is not part of the primary key"

That isn't possible. The partition key must be contained by the 
primary
key. That is, the primary key could be site_id, id and you can create 
hash

partition on id or site_id but not created_on.

You could drop primary key and foreign keys and implement them via 
trigger

functions as described in this blog series, but it seems questionable-
https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/

I do not assume the restriction would be dropped in future releases. I
don't know that scanning all the partitions to figure out whether the
primary key is violated would be advisable. Which is what the trigger
functions described in the blog post has to do, right?

It might be noteworthy that partitioning with more than 10-100 
partitions

is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from
testing shared by those working on that code.






Michael, thank you for your response. I think I now grasp the idea: if 
there is a uniqueness constraint, then the database would rather not 
visit all partitions to check for constraint violation, but want to 
identify the single partition that might fail that; hence any 
partitioning value must be a subset of or be identical to the uniqueness 
constraint.


I get that this makes sense if you have *many* partitions; however, I 
basically want to end up with two partitions, “hot” and “cold”. 
A row’s lifetime starts in a hot partition, and, after being 
processed, moves into the cold partition.


Most of the work actually happens in the hot partition, so I think 
having this as small as possible is probably helpful. For numbers: the 
hot partition would tyically contain ~1 rows, the cold partition, on 
the other hand, will have 10s of millions. At the same time I still want 
to be able to look up a row by its id in the root relation, not in the 
concrete partitions. Having the database validate a uniqueness 
constraint in two tables instead of in one would be a worthwhile 
sacrifice for me.


Having said that I just realized I could probably reach my goal by 
setting up explicit hot and cold tables, move rows around manually 
whenever their “hotness” changes, and set up a view which combines 
both tables into a single relation. I would only have to drop all 
explicit FK references from the schema. A downside, certainly, but one 
that I could live with.


Best,
/eno


--
me on github: http://github.com/radiospiel