Conflict between recovery thread and client queries on a hot standby replica

2018-09-09 Thread Kim Rose Carlsen
Hi


It seems that the recovery thread(streaming replication) and sql queries 
against a postgresql hot standby can produce a deadlock.


If you have conflicting statements running on master and slave, you can end up 
with queries on the slave waiting for locks of the recovery thread and the 
recovery thread waiting for locks on the sql thread. The locks are never 
resolved or any client aborted. This happens when using


max_standby_streaming_delay=-1


Which does state that the recovery thread can wait indefinitely on a SQL 
thread, but I would still expect dead locks to be detected and handled.


How to reproduce:

- Setup pgsql master/slave with streaming replication

- On the slave "set max_standby_streaming_delay=-1"

- Start a thread on master that does the following

BEGIN;
  CREATE OR REPLACE VIEW va AS SELECT 1;
  CREATE OR REPLACE VIEW vb as SELECT 2;
COMMIT;
- Start multiple threads on the slave that does the following (with 5 workers 
running, a deadlock is produced within minutes)
SELECT * FROM vb;
SELECT * FROM va;


Expected behaviour:

  The 5 threads will continue be able to query the views va, vb


Observed behaviour:

  The 5 threads are blocked from reading the tables, after some time. They are 
never allowed access to the views before all queries are aborted and restarted.


I have build a docker compose file to reproduce my observation.

https://github.com/kimc78/postgresql-slave-deadlock




Best regards
Kim Carlsen



Conflict between recovery thread and client queries on a hot standby replica

2018-09-09 Thread Kim Rose Carlsen
Hi


It seems that the recovery thread(streaming replication) and sql queries 
against a postgresql hot standby can produce a deadlock.


If you have conflicting statements running on master and slave, you can end up 
with queries on the slave waiting for locks of the recovery thread and the 
recovery thread waiting for locks on the sql thread. The locks are never 
resolved or any client aborted. This happens when using


max_standby_streaming_delay=-1


Which does state that the recovery thread can wait indefinitely on a SQL 
thread, but I would still expect dead locks to be detected and handled.


How to reproduce:

- Setup pgsql master/slave with streaming replication

- On the slave "set max_standby_streaming_delay=-1"

- Start a thread on master that does the following

BEGIN;
  CREATE OR REPLACE VIEW va AS SELECT 1;
  CREATE OR REPLACE VIEW vb as SELECT 2;
COMMIT;
- Start multiple threads on the slave that does the following (with 5 workers 
running, a deadlock is produced within minutes)
SELECT * FROM vb;
SELECT * FROM va;


Expected behaviour:

  The 5 threads will continue be able to query the views va, vb


Observed behaviour:

  The 5 threads are blocked from reading the tables, after some time. They are 
never allowed access to the views before all queries are aborted and restarted.


I have build a docker compose file to reproduce my observation.

https://github.com/kimc78/postgresql-slave-deadlock




Best regards
Kim Carlsen



Re: survey: pg_stat_statements total_time and entry deallocation

2018-09-11 Thread Kim Rose Carlsen
Here is some data from our production. I hope it can be of any use to you.

- System 1:
hiper=> select  
   bucket
  ,count(*) entries
  ,max(calls) max_calls
  ,round(sum(total_time)) total_time
  ,round((100*sum(total_time)/avg(total_total_time))::numeric,2) pct_time
  ,round(sum(rows)) "rows"
  ,round((100*sum(rows)/avg(total_rows))::numeric,2) pct_rows
from
  (
    select
   ntile(20) over (order by calls) bucket
  ,calls
  ,total_time
  ,sum(total_time) over () total_total_time
  ,rows
  ,sum(rows) over () total_rows
    from pg_stat_statements
  ) stmts
group by rollup(bucket) order by bucket;
 bucket | entries | max_calls | total_time | pct_time |    rows    | pct_rows 
+-+---++--++--
  1 | 245 |    71 |    4745479 | 0.38 | 640677 | 0.04
  2 | 245 |    96 |   22151762 | 1.76 | 236827 | 0.01
  3 | 245 |   122 |    9028387 | 0.72 | 297861 | 0.02
  4 | 244 |   167 |    4711705 | 0.38 | 328928 | 0.02
  5 | 244 |   228 |    9490670 | 0.76 | 337712 | 0.02
  6 | 244 |   305 |    7296024 | 0.58 | 273740 | 0.02
  7 | 244 |   394 |   35828651 | 2.85 |    1140064 | 0.07
  8 | 244 |   540 |   34180388 | 2.72 |    1313171 | 0.08
  9 | 244 |   711 |   29748121 | 2.37 | 865894 | 0.05
 10 | 244 |   989 |   12864432 | 1.02 |    1665529 | 0.10
 11 | 244 |  1507 |    4009346 | 0.32 |    1295032 | 0.08
 12 | 244 |  2511 |   13444734 | 1.07 |    4711699 | 0.30
 13 | 244 |  4567 |  401096681 |    31.94 |    3386595 | 0.21
 14 | 244 |  8086 |    4750899 | 0.38 |    8236002 | 0.52
 15 | 244 | 13356 |   19875345 | 1.58 |    6040996 | 0.38
 16 | 244 | 22454 |   23361859 | 1.86 |   16906926 | 1.06
 17 | 244 | 59660 |   68633113 | 5.46 |   40170089 | 2.52
 18 | 244 |    141667 |   59768727 | 4.76 |   76054887 | 4.77
 19 | 244 |    431946 |  330488976 |    26.31 |  213238961 |    13.38
 20 | 244 | 170978486 |  160486607 |    12.78 | 1216933189 |    76.34
    |    4883 | 170978486 | 1255961906 |   100.00 | 1594074779 |   100.00
(21 rows)

hiper=> select current_setting('pg_stat_statements.max');
 current_setting 
-
 5000
(1 row)





How to investigate what postgres is spending time on

2018-09-23 Thread Kim Rose Carlsen
Hi


I have some simple INSERT / UPDATE queries, that takes a long time the first 
time they are run in out test environment, but I'm not sure what postgres is 
doing and what I can do to help it. Whats common is that the table contains 
many rows in the order of about 20 millions.


Query:

INSERT INTO communication.request_parameter (request_id, 
template_version_parameter_id, parameter_value)
 VALUES (1222, 1211, 122) RETURNING request_parameter_id


Row from pg_stat_statements:
---+++--++--++++++++++++++++++-
userid | dbid   | queryid| query
| calls  | total_time   | min_time   | max_time   | 
mean_time  | stddev_time| rows   | shared_blk | shared_blk 
| shared_blk | shared_blk | local_blks | local_blks | local_blks | local_blks | 
temp_blks_ | temp_blks_ | blk_read_t | blk_write_ |
---+++--++--++++++++++++++++++-
16385  | 16389  | 2064198912 | INSERT INTO 
communication.request_parameter (request_id, tem | 98 | 646.393451  
 | 0.03   | 638.712758 | 6.59585154081633   | 64.1818799227704   | 98   
  | 2850   | 24 | 21 | 0  | 0  | 0  
| 0  | 0  | 0  | 0  | 0  | 0  |

Description of table:
# \d communication.request_parameter
 Table 
"communication.request_parameter"
Column |   Type| Collation | Nullable | 
   Default
---+---+---+--+---
 request_parameter_id  | integer   |   | not null | 
nextval('communication.request_parameter_request_parameter_id_seq'::regclass)
 request_id| integer   |   | not null |
 template_version_parameter_id | integer   |   | not null |
 parameter_value   | character varying |   |  |
Indexes:
"request_parameter_pkey" PRIMARY KEY, btree (request_parameter_id)
"request_parameter_parameter_value_idx" btree (parameter_value)
"request_parameter_request_id_idx" btree (request_id)
"request_parameter_template_version_parameter_id_idx" btree 
(template_version_parameter_id)
Foreign-key constraints:
"request_parameter_request_id_fkey" FOREIGN KEY (request_id) REFERENCES 
communication.request(request_id)
"request_parameter_template_version_parameter_id_fkey" FOREIGN KEY 
(template_version_parameter_id) REFERENCES 
communication.template_version_parameter(template_version_parameter_id)

This only happens in testing, and on a cold bootet database. The test database 
is constructed with pg_dump and restore on fresh postgres installation.



Best Regards

Kim Carlsen



Postgres wont remove useless joins, when the UNIQUE index is partial

2019-01-10 Thread Kim Rose Carlsen
Hi

remove_useless_join does not prove uniqueness if the unique index is partial, 
and therefore wont remove the join if no columns are referenced (see example in 
bottom).

I have been trying to look around the source code and from what I have 
identified the problem seems to be that "check_index_predicates(..)" happens 
after "remove_useless_join(..)", and therefore cannot see that the unique index 
is actually covered by the join condition.

>From analyzejoins.c:612, rel_supports_distinctness(..)
  if (ind->unique && ind->immediate &&
   (ind->indpred == NIL || ind->predOK))
   return true;

But the problem is ind->predOK is calculated in check_index_predicates(..) but 
this happens later so ind->predOK is always false when checked here.

I have tried to add check_index_predicates(..) to rel_supports_distinctness(..) 
and this produces the expected plan, but I have no idea of the implication of 
doing check_index_predicates(..) earlier.

This is my first time looking at the postgres source code, so I know attached 
"patch" is not the solution, but any pointers on where to go from here would be 
appreciated.


Example:
CREATE TABLE a (
  id INTEGER PRIMARY KEY,
  sub_id INTEGER NOT NULL,
  deleted_at TIMESTAMP
);
CREATE UNIQUE INDEX ON a (sub_id) WHERE (deleted_at IS NULL);

ANALYZE a;

EXPLAIN SELECT 1 FROM a AS a LEFT JOIN a AS b ON a.id = b.sub_id AND 
b.deleted_at IS NULL;

Expected plan:
 QUERY PLAN
-
 Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)

Actual plan:
  QUERY PLAN
---
 Hash Left Join  (cost=14.76..48.13 rows=1850 width=4)
   Hash Cond: (a.id = b.sub_id)
   ->  Seq Scan on a  (cost=0.00..28.50 rows=1850 width=4)
   ->  Hash  (cost=14.65..14.65 rows=9 width=4)
 ->  Bitmap Heap Scan on a b  (cost=4.13..14.65 rows=9 width=4)
   Recheck Cond: (deleted_at IS NULL)
   ->  Bitmap Index Scan on a_sub_id_idx  (cost=0.00..4.13 rows=9 
width=0)
(7 rows)


mvh
Kim Carlsen
Hiper A/S
M: 71 99 42 00
diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 1593dbec21..12da689983 100644
--- a/src/backend/optimizer/plan/analyzejoins.c
+++ b/src/backend/optimizer/plan/analyzejoins.c
@@ -596,6 +596,7 @@ rel_supports_distinctness(PlannerInfo *root, RelOptInfo *rel)
return false;
if (rel->rtekind == RTE_RELATION)
{
+   check_index_predicates(root, rel);
/*
 * For a plain relation, we only know how to prove uniqueness by
 * reference to unique indexes.  Make sure there's at least one