Fast seralizable transactions starving slower ones

2018-06-06 Thread Viktor Fougstedt

Hello!

We have a system with a lot of integrity constraints that are not easily 
expressed as SQL constraints. We therefore run all writing transactions in 
serializable isolation, so that our code can make SELECT:s to check the 
constraints.

We’ve run in to a, well, it’s not a “problem”, because everything is working 
according to spec. But some of our transactions are much quicker than the 
other. Since the first COMMIT of colliding seralizable transactions always 
wins, the shorter transactions starve the slower ones, which are always 
restarted.

We’re now working on a solution where all transactions start by taking an 
advisory lock. Normally transactions release it immediately, but a transaction 
which has been restarted multiple times will keep it, preventing any others 
from starting. Thereby it will run to completion, normally within a maximum of 
one more restart.

Are there any other solutions to this starvation problem? 

Regards,
/Viktor

Incredibly slow queries on information_schema.constraint_column_usage?

2018-01-12 Thread Viktor Fougstedt

Hello!

We have a program which auto-generates tables and constraints, run against 
PostgreSQL 9.5.

On startup, the program checks the current database schema to detect changes 
that are needed. For one of our database servers, some selects from 
information_schema are incredibly slow:

select count(*) from information_schema.constraint_column_usage;
 count
---
   844
(1 row)

The above query takes 55 seconds to run.

explain select count(*) from information_schema.constraint_column_usage;
 Aggregate  (cost=6607288.37..6607288.38 rows=1 width=0)
   ->  Append  (cost=66.87..6446680.69 rows=64243070 width=0)

 [complete output at the end of this email]

Is the database instance broken somehow, or are these exceptional execution 
times to be expected? Can I do something differently?

Grateful for any hints.

Regards,
/Viktor


Complete explain plan:

Aggregate  (cost=6607288.37..6607288.38 rows=1 width=0)
   ->  Append  (cost=66.87..6446680.69 rows=64243070 width=0)
 ->  Subquery Scan on "*SELECT* 1"  (cost=66.87..66.89 rows=1 width=0)
   ->  HashAggregate  (cost=66.87..66.88 rows=1 width=324)
 Group Key: nr.nspname, r.relname, r.relowner, a.attname, 
nc.nspname, c.conname
 ->  Nested Loop  (cost=38.06..66.86 rows=1 width=324)
   ->  Nested Loop  (cost=37.80..66.56 rows=1 width=264)
 ->  Nested Loop  (cost=37.38..66.10 rows=1 
width=200)
   ->  Nested Loop  (cost=36.96..57.87 
rows=1 width=136)
 ->  Hash Join  (cost=36.54..40.97 
rows=2 width=132)
   Hash Cond: (nc.oid = 
c.connamespace)
   ->  Seq Scan on pg_namespace 
nc  (cost=0.00..3.94 rows=94 width=68)
   ->  Hash  (cost=36.51..36.51 
rows=2 width=72)
 ->  Seq Scan on 
pg_constraint c  (cost=0.00..36.51 rows=2 width=72)
   Filter: (contype 
= 'c'::"char")
 ->  Index Scan using 
pg_depend_depender_index on pg_depend d  (cost=0.42..8.44 rows=1 width=12)
   Index Cond: ((classid = 
'2606'::oid) AND (objid = c.oid))
   Filter: (refclassid = 
'1259'::oid)
   ->  Index Scan using 
pg_attribute_relid_attnum_index on pg_attribute a  (cost=0.42..8.21 rows=1 
width=70)
 Index Cond: ((attrelid = 
d.refobjid) AND (attnum = d.refobjsubid))
 Filter: (NOT attisdropped)
 ->  Index Scan using pg_class_oid_index on 
pg_class r  (cost=0.41..0.46 rows=1 width=76)
   Index Cond: (oid = a.attrelid)
   Filter: (pg_has_role(relowner, 
'USAGE'::text) AND (relkind = 'r'::"char"))
   ->  Index Scan using pg_namespace_oid_index on 
pg_namespace nr  (cost=0.27..0.29 rows=1 width=68)
 Index Cond: (oid = r.relnamespace)
 ->  Subquery Scan on "*SELECT* 2"  (cost=2512.13..6446613.80 
rows=64243069 width=0)
   ->  Nested Loop  (cost=2512.13..5804183.11 rows=64243069 width=0)
 Join Filter: CASE WHEN (c_1.contype = 'f'::"char") THEN 
((r_1.oid = c_1.confrelid) AND (a_1.attnum = ANY (c_1.confkey))) ELSE ((r_1.oid 
= c_1.conrelid) AND (a_1.attnum = ANY (c_1.conkey))) END
 ->  Hash Join  (cost=2507.01..22250.59 rows=153142 width=6)
   Hash Cond: (a_1.attrelid = r_1.oid)
   ->  Seq Scan on pg_attribute a_1  
(cost=0.00..16389.75 rows=485975 width=6)
 Filter: (NOT attisdropped)
   ->  Hash  (cost=2316.95..2316.95 rows=15205 width=4)
 ->  Hash Join  (cost=5.12..2316.95 rows=15205 
width=4)
   Hash Cond: (r_1.relnamespace = nr_1.oid)
   ->  Seq Scan on pg_class r_1  
(cost=0.00..2102.76 rows=15205 width=8)
 Filter: (pg_has_role(relowner, 
'USAGE'::text) AND (relkind = 'r'::"char"))
   ->  Hash  (cost=3.94..3.94 rows=94 
width=4)
 ->  Seq Scan on pg_namespace nr_1  
(cost=0.00..3.94 rows=94 width=4)
 ->  Materialize  (cost=5.12..58.41 rows=839 width=55)
   ->  Hash Join  (cost=5.12..54.21 rows=839 width=55)
 Hash Cond: (c_1.connamespace = nc_1.oid)

Re: Given a set of daterange, finding the continuous range that includes a particular date

2018-02-23 Thread Viktor Fougstedt

Hi.

This answer is perhaps useful if I understand your problem correctly. But I 
might have interpreted it wrongly. :-)


I would probably start with merging intervals so that overlapping and adjacent 
intervals become single continuous intervals, then select from those merged 
intervals.

We have an application with a lot of interval handling in PostgreSQL, and we 
use many variants of algorithms based on one by Itzik Ben Gan which he calls 
“packing intervals”. The post we started with was an old reader’s challenge 
from SQL Magazine. It has since been updated when MS-SQL started supporting 
window functions better (Itzik Ben Gan is a MS-SQL-guy).

Basically, it is a few CTE:s which convert the intervals into “start” (+1) and 
“stop” (-1) events, then keeps a running sum of these, and finally creates new 
intervals by noting that the merged (or “packed”) intervals starts with events 
that had sum=0 before them and stops with events which have sum=0 after them.

It involves both CTE:s and window functions and is quite a beautiful example of 
SQL, IMHO.

I think it’s best to google it, he does a good job of explaining how it works.

Hope that helps a bit at least.

/Viktor

On 23 Feb 2018, at 01:44, Ken Tanzer 
mailto:ken.tan...@gmail.com>> wrote:

Hi, hoping to get some help with this.  I'm needing to take a specific date, a 
series of dateranges  and, given a specific date, return a single conitinuous 
daterange that includes that date.

To elaborate a bit, I've got lots of tables that include start and end dates.  
For example:

CREATE TABLE tbl_staff_assign (
staff_assign_id SERIAL PRIMARY KEY,
client_id   INTEGER NOT NULL REFERENCES tbl_client (client_id),
staff_idINTEGER REFERENCES tbl_staff(staff_id),
staff_assign_type_code  VARCHAR(10) NOT NULL REFERENCES 
tbl_l_staff_assign_type (staff_assign_type_code),
staff_assign_date   DATE NOT NULL,
staff_assign_date_end   DATE,
...

So a client might leave a progrma and then return later, or they might simply 
switch to another staff_id.  (In which case one record will have and end date, 
and the next record will start on the next day.)  In this case I need to know 
"what period were they continuously in the program that includes X date?"  So 
I'd like to be able to do something like:

"SELECT staff_assign_date,continuous_daterange( staff_assign_date, (SELECT 
array_agg(daterange(staff_assign_date,staff_assign_date_end,'[]')
) FROM staff_assign sa2 WHERE sa2.client_id=sa.client_id) FROM staff_assign sa

I've done this before with procedures specific to a particular table, and 
working with the start and end dates.  I'm now wanting to try to do this once 
generically that will work for all my cases.  So I'm hoping to do this in a way 
that performance isn't horrible.  And it's a little unclear to me how much and 
how I might be able to use the daterange operators to accomplish this 
efficiently.

Any advice or suggestions or ways to go about this appreciated.  Thanks!

Ken

p.s.,  Another small wrinkle is these records aren't always perfect, and 
ideally I'd allow for an optional fudge factor that would allow small gaps to 
be ignored.  I could just add that in every query (start_date+2,end_date-2), 
but it might be nice to have the function do it, if it didn't badly hurt 
performance.


--
[http://agency-software.org/demo/client/images/agency_logo_small.png]
AGENCY Software
A Free Software data system
By and for non-profits
http://agency-software.org/
https://demo.agency-software.org/client
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing 
list to
learn more about AGENCY or
follow the discussion.