SELECT .. FOR UPDATE: find out who locked a row
Hello, I am using `SELECT * FROM queue ... FOR UPDATE SKIP LOCKED` to implement a queueing system. Now I wonder if it is possible, given the id of one of the locked rows in the queue table, to find out which connection/which transaction owns the lock. Any help is greatly appreciated. Thank you, /eno -- me at github: https://github.com/radiospiel me at linked.in: https://www.linkedin.com/in/radiospiel
Re: SELECT .. FOR UPDATE: find out who locked a row
Hi Melvin, hi everyone else, thank you for your support, and for your query example. And oh yes, I forgot to mention the postgres version, which is 9.6; but if I find a solution which works in Version 10 then I could probably update. I guess with your query I can figure out which connection holds a lock, but it seems I cannot correlate those locks to the rows which actually are locked, since `pg_locks` seems not to reference this in any way. To be more explicit: I can find out about all locks in the current database that are held by other connections using ``` select l.* from pg_locks l left join pg_database d on l.database=d.oid where pid <> pg_backend_pid() and d.datname = current_database() and relation::regclass=''::regclass; ``` which, with one locked row, results in something like this: ``` locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath --+--+--+--+---++---+-+---+--++---+--+-+-- relation | 629976 | 638971 | | || | | | | 3/983554 | 60515 | RowShareLock | t | t (1 row) ``` And here is where I am stuck. `database`, `relation` and `pid` are accounted for - the only value I can't make sense of is the `virtualtransaction` entry. I was hoping that objid or objsubid would contain the OID of the locked row, but obviously I miss a crucial piece of understanding :) (Note that I tried this both WITH OID and without oid in my table.) Best, /eno -- me at github: https://github.com/radiospiel me at linked.in: https://www.linkedin.com/in/radiospiel On 15 Mar 2018, at 22:12, Melvin Davidson wrote: On Thu, Mar 15, 2018 at 4:48 PM, Stephen Frost wrote: Greetings, * Enrico Thierbach (e...@open-lab.org) wrote: I am using `SELECT * FROM queue ... FOR UPDATE SKIP LOCKED` to implement a queueing system. Now I wonder if it is possible, given the id of one of the locked rows in the queue table, to find out which connection/which transaction owns the lock. Sure, you can attempt to lock the record and then run pg_blocking_pids() (in another session) against the pid which is trying to acquire the lock. Session #1: Connect SELECT * FROM queue ... FOR UPDATE SKIP LOCKED; ... gets back some id X ... waits Session #2: Connect SELECT pg_backend_pid(); -- save this for the 3rd session SELECT * FROM queue WHERE id = X FOR UPDATE; ... get blocked waiting for #1 ... waits Session #3: SELECT pg_blocking_pids(SESSION_2_PID); -- returns PID of Session #1 Obviously there's race conditions and whatnot (what happens if session #1 releases the lock?), but that should work to figure out who is blocking who. If you're on a version of PG without pg_blocking_pids then you can look in the pg_locks view, though that's a bit more annoying to decipher. Thanks! Stephen Now I wonder if it is possible, given the id of one of the locked rows in the queue table, to find out which connection/which transaction owns You have not specified which version of PostgreSQL, but try this query. SELECT c.datname, c.pid as pid, c.client_addr, c.usename as user, c.query, c.wait_event, c.wait_event_type, /* CASE WHEN c.waiting = TRUE THEN 'BLOCKED' ELSE 'no' END as waiting, */ l.pid as blocked_by, c.query_start, current_timestamp - c.query_start as duration FROM pg_stat_activity c LEFT JOIN pg_locks l1 ON (c.pid = l1.pid and not l1.granted) LEFT JOIN pg_locks l2 on (l1.relation = l2.relation and l2.granted) LEFT JOIN pg_stat_activity l ON (l2.pid = l.pid) LEFT JOIN pg_stat_user_tables t ON (l1.relation = t.relid) WHERE pg_backend_pid() <> c.pid ORDER BY datname, query_start; -- *Melvin Davidson* *Maj. Database & Exploration Specialist* *Universe Exploration Command – UXC* Employment by invitation only!
Re: SELECT .. FOR UPDATE: find out who locked a row
Hi Melvin, Stephen, hi list, *FWIW, I really don't understand your need to identify the actual rows that are locked. Once you have identified the query that is causing a block (which is usually due to "Idle in Transaction"), AFAIK the only way to remedy the problem is to kill the offending query, or wait for it to complete. I am not aware of any way available to a user to "unlock" individual rows". Indeed, if you could, it would probably lead to corruption of some form.* The goal is to run a job queue, with a potentially largish number of workers that feed of the queue. So it would be useful to find out which queue entry is being processed right now (I can easily find out: when a row cannot be read via SKIP UNLOCKED it is locked, and probably being worked upon.) It would also be great to understand which worker holds the lock. The intention is NOT to kill the worker or its query. With what the conversation brought up here (still trying to catch up with everything) I can: 1) determine all workers that currently are holding a lock (via Melvin’s); 2) on an individual base try to lock the row in a second connection and use a third connection to figure out which worker connection holds a lock on a specific single row (via Stephen’s). This is probably good enough to cover the necessary basic functionality, so thank you for your input. Am I correct to assume that there is no other way to determine who is holding a lock on a specific row and/or determine this for many rows in one go? (I guess I am also correct to assume that whatever the worker is doing there is no way to somehow write this information into the database **via the same connection**. (Using a second connection would be obviously easy) Best, /eno -- me at github: https://github.com/radiospiel me at linked.in: https://www.linkedin.com/in/radiospiel
Re: SELECT .. FOR UPDATE: find out who locked a row
Thanks Steven, Evidently my second email got lost somewhere along the way- what you're looking for is an extension called 'pgrowlocks': https://www.postgresql.org/docs/10/static/pgrowlocks.html My prior email on that subject is here: https://www.postgresql.org/message-id/20180315220512.GV2416%40tamriel.snowman.net I’ll look into that as soon after next week. Best, /eno
Re: SELECT .. FOR UPDATE: find out who locked a row
-- me at github: https://github.com/radiospiel me at linked.in: https://www.linkedin.com/in/radiospiel You probably considered this but the queuing mechanism I use doesn't hold locks on records during processing. Workers claim tasks by locking them, setting a claimed flag of some sort, the releasing the lock (including worker identity if desired) - repeating the general procedure once completed. My volume is such that the bloat the extra update causes is not meaningful and is easily handled by (auto-)vacuum. David J. Hi David, well, I though about it and then put it to rest initially, since I liked the idea that with a running job kept “inside” the transaction I would never have zombie entries there: if somehow the network connection gets lost for the client machines the database would just rollback the transaction, the job would revert to its “ready-to-run” state, and the next worker would pick it up. However, I will probably reconsider this, because it has quite some advantages; setting a “processing” state, let alone keeping a worker identitiy next to the job seems much more straightforward. So how do you solve the “zombie” situation? Best, Enrico
Trouble matching a nested value in JSONB entries
Hi list, I have some trouble matching a value in a JSONB object against multiple potential matches. Lets say, I have a table with an id, and a metadata JSONB column, which holds data like the following 1 | {"group_id": 1} 2 | {“group_id": 1} 3 | {“group_id": 2} 4 | {“group_id": 3} I would like to run a query which gives me the result of `SELECT id FROM mytable WHERE metadata->>’group_id’ IN (1,2)`. Now, obviously I could use this query, but I would like to get away without an explicit index on `metadata->>’group_id’`, and I was hoping to find something using the JSONB containment operators, with support of a gist or gin index. The following seems to work select * from mytable where (metadata @> '{"group_id":1}') but only with a single value to match. I could, of course, also “denormalize” the query a la select * from mytable where (metadata @> '{"group_id":1}') OR (metadata @> '{"group_id”:2}’) but this seems to call for long execution times; also, depending on the number of different tag names and values to match this could really explode into quite a large query. Stackoverflow suggests the use of ANY select * from mytable where (tags->'group_id' @> ANY( ARRAY ['1','3']::jsonb[] ) ); https://dba.stackexchange.com/questions/130699/postgresql-json-query-array-against-multiple-values This seems to work - but doesn’t that require a group_id specific index again? Anything I overlooked? Best, /eno PS: Please note that I am currently at postgres 9.5. An update, if necessary, would be possible though. -- me at github: https://github.com/radiospiel me at linked.in: https://www.linkedin.com/in/radiospiel
Re: Trouble matching a nested value in JSONB entries
Oleg, 1 | {"group_id": 1} 2 | {“group_id": 1} 3 | {“group_id": 2} 4 | {“group_id": 3} PS: Please note that I am currently at postgres 9.5. An update, if necessary, would be possible though. Upgrade, please ! I have only master 11beta2 right now: select * from qq where js @> '{"group_id":1}'; id | js +- 1 | {"group_id": 1} 2 | {"group_id": 1} (2 rows) thanks for your answer. Your code does work fine on 9.5 already; what I would like to achieve is to get records where the group_id entry is 1 or 2, and a fear that select * from mytable where (metadata @> '{"group_id":1}') OR (metadata @> '{"group_id”:2}’) - while doable - would not make the best use of existing indices. Any more ideas? Best, /eno
partitions vs indexes
Hello list, I run into some trouble with partitions: I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key. Also, a column might hold a referenece to a parent row. So this is my current table setup, slimmed down: CREATE TYPE statuses AS ENUM ('ready', ‘processing’, ‘done’); CREATE TABLE mytable ( id BIGSERIAL PRIMARY KEY NOT NULL, parent_id BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready' ); Since entries in the table are often frequented when status is not ‘done’ I would like to partition by state. However, if I want to do that: CREATE TABLE mytable ( id BIGSERIAL NOT NULL, parent_id BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready' -- UNIQUE(id, status)-- doesn’t work: can’t reference parent -- UNIQUE(id)-- doesn’t work: can’t partition ) PARTITION BY LIST(status); I need to add status to the primary key or another unique constraint. In that case, however, I can no longer have the foreign key constraint on parent_id. Weirdly enough partitioning works fine if there is no unique constraint on that table: CREATE TABLE mytable ( id BIGSERIAL NOT NULL, status statuses DEFAULT 'ready' ) PARTITION BY LIST(status); So partitioning seems to require the column being in a unique constraint if and only if a unique constraint exist on the table. Also I cannot create multiple unique constraints on the table. Here comes my question: - Do I miss something? - ThI don’t understand the requirement the partition value to be part of a unique constraint if such a constraint exists, since partitioning seems to work fine if the table has no unique constraints at all. Can someone shed some light on that? Is that maybe an artificial limitation that will go away on the future? - Any suggestions how I could proceed? Thank you for any suggestion! Best, Eno -- me on github: http://github.com/radiospiel
Re: partitions vs indexes
On 2 Oct 2019, at 22:09, Enrico Thierbach wrote: Hello list, I run into some trouble with partitions: I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key. Also, a column might hold a referenece to a parent row. So this is my current table setup, slimmed down: CREATE TYPE statuses AS ENUM ('ready', ‘processing’, ‘done’); CREATE TABLE mytable ( id BIGSERIAL PRIMARY KEY NOT NULL, parent_id BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready' ); Since entries in the table are often frequented when status is not ‘done’ I would like to partition by state. However, if I want to do that: CREATE TABLE mytable ( id BIGSERIAL NOT NULL, parent_id BIGINT REFERENCES mytable(id) ON DELETE CASCADE, status statuses DEFAULT 'ready' -- UNIQUE(id, status)-- doesn’t work: can’t reference parent -- UNIQUE(id)-- doesn’t work: can’t partition ) PARTITION BY LIST(status); I need to add status to the primary key or another unique constraint. In that case, however, I can no longer have the foreign key constraint on parent_id. Weirdly enough partitioning works fine if there is no unique constraint on that table: CREATE TABLE mytable ( id BIGSERIAL NOT NULL, status statuses DEFAULT 'ready' ) PARTITION BY LIST(status); So partitioning seems to require the column being in a unique constraint if and only if a unique constraint exist on the table. Also I cannot create multiple unique constraints on the table. Here comes my question: - Do I miss something? - ThI don’t understand the requirement the partition value to be part of a unique constraint if such a constraint exists, since partitioning seems to work fine if the table has no unique constraints at all. Can someone shed some light on that? Is that maybe an artificial limitation that will go away on the future? - Any suggestions how I could proceed? Thank you for any suggestion! Best, Eno -- me on github: http://github.com/radiospiel and, errm, forgot to mention thatI am on postgresql 11.3. Sorry for that omission. Best, eno -- me on github: http://github.com/radiospiel
Re: partitions vs indexes
On 2 Oct 2019, at 22:16, Michael Lewis wrote: "I would like to convert a table with a primary key into a partitioned setup by a column which is not part of the primary key" That isn't possible. The partition key must be contained by the primary key. That is, the primary key could be site_id, id and you can create hash partition on id or site_id but not created_on. You could drop primary key and foreign keys and implement them via trigger functions as described in this blog series, but it seems questionable- https://www.depesz.com/2018/10/31/foreign-key-to-partitioned-table-part-2/ I do not assume the restriction would be dropped in future releases. I don't know that scanning all the partitions to figure out whether the primary key is violated would be advisable. Which is what the trigger functions described in the blog post has to do, right? It might be noteworthy that partitioning with more than 10-100 partitions is MUCH faster in PG12 than PG11 (up to 4-8 thousand partitions) from testing shared by those working on that code. Michael, thank you for your response. I think I now grasp the idea: if there is a uniqueness constraint, then the database would rather not visit all partitions to check for constraint violation, but want to identify the single partition that might fail that; hence any partitioning value must be a subset of or be identical to the uniqueness constraint. I get that this makes sense if you have *many* partitions; however, I basically want to end up with two partitions, “hot” and “cold”. A row’s lifetime starts in a hot partition, and, after being processed, moves into the cold partition. Most of the work actually happens in the hot partition, so I think having this as small as possible is probably helpful. For numbers: the hot partition would tyically contain ~1 rows, the cold partition, on the other hand, will have 10s of millions. At the same time I still want to be able to look up a row by its id in the root relation, not in the concrete partitions. Having the database validate a uniqueness constraint in two tables instead of in one would be a worthwhile sacrifice for me. Having said that I just realized I could probably reach my goal by setting up explicit hot and cold tables, move rows around manually whenever their “hotness” changes, and set up a view which combines both tables into a single relation. I would only have to drop all explicit FK references from the schema. A downside, certainly, but one that I could live with. Best, /eno -- me on github: http://github.com/radiospiel