Multi-second pauses blocking even trivial activity

2018-09-07 Thread Patrick Molgaard
Hi folks,

I've been seeing some curious behaviour on a postgres server I administer.

Intermittently (one or two times a week), all queries on that host are
simultaneously blocked for extended periods (10s of seconds).

The blocked queries are trivial & not related to locking - I'm seeing
slowlogs of the form:

`LOG: duration: 22627.299 ms statement: SET client_encoding='''utf-8''';`

where this is the first statement on a fresh connection.

It happens even for connections from the same host - so it doesn't appear
to be e.g. network slowness, if that is even counted in query duration.

Does anyone have any hints for where to look for a cause?

Thanks,

Patrick

-

Set up information:

Postgres version:  PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc
(Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609, 64-bit

Full Table and Index Schema: not applicable, I think

EXPLAIN ANALYSE: n/a

Hardware: AWS i3.2xlarge

Maintenance Setup: autovacuum yes, but the times it runs don't correlate to
the incidences of slow queries

WAL settings: shipped to S3 with wal-e, stored on same disk for interim
period


Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Jeff Janes
On Fri, Sep 7, 2018 at 8:00 AM Patrick Molgaard  wrote:

> Hi folks,
>
> I've been seeing some curious behaviour on a postgres server I administer.
>
> Intermittently (one or two times a week), all queries on that host are
> simultaneously blocked for extended periods (10s of seconds).
>
> The blocked queries are trivial & not related to locking - I'm seeing
> slowlogs of the form:
>
> `LOG: duration: 22627.299 ms statement: SET client_encoding='''utf-8''';`
>
>
Do you have log_lock_waits set to on?  If not, you might want to turn it on.

Cheers,

Jeff


Partial index plan/cardinality costing

2018-09-07 Thread James Coleman
I have the following tables:
- m(pk bigserial primary key, status text): with a single row
- s(pk bigserial primary key, status text, action_at date, m_fk bigint):
  * 80% of the data has action_at between the current date and 1 year ago
 and status of E or C
  * 20% of the data has action_at between 5 days ago and 25 days into the
 future and status of P, PD, or A

I have two partial indexes:
- s_pk_action_at on s(pk, action_at) where status in ('P', 'PD', 'A')
- s_action_at_pk on s(action_at, pk) where status in ('P', 'PD', 'A')

With the query:
SELECT s.pk FROM s
INNER JOIN m ON m.pk = s.m_fk
WHERE
  s.status IN ('A', 'PD', 'P')
  AND (action_at <= '2018-09-06')
  AND s.status IN ('A', 'P')
  AND m.status = 'A';

I generally expect the index s_action_at_pk to always be preferred over
s_pk_action_at. And on stock Postgres it does in fact use that index (with
a bitmap index scan).

We like to set random_page_cost = 2 since we use fast SSDs only. With that
change Postgres strongly prefers the index s_pk_action_at unless I both
disable the other index and turn off bitmap heap scans.

I'm attaching the following plans:
- base_plan.txt: default costs; both indexes available
- base_plan_rpc2.txt: random_page_cost = 2; both indexes available
- inddisabled_plan_rpc2.txt: random_page_cost = 2; only s_action_at_pk
available
- inddisabled_bhsoff_plan_rpc2.txt: random_page_cost = 2; enable_bitmapscan
= false;  only s_action_at_pk available

A couple of questions:
- How is s_pk_action_at ever efficient to scan? Given that the highest
cardinality (primary key) column is first, wouldn't an index scan
effectively have to scan the entire index?
- Why does index scan on s_action_at_pk reads over 2x as many blocks as the
bitmap heap scan with the same index?
- Would you expect Postgres to generally always prefer using the
s_action_at_pk index over the s_pk_action_at index for this query? I
realize changing the random page cost is part of what's driving this, but I
still can't imagine reading the full s_pk_action_at index (assuming that's
what it is doing) could ever be more valuable.

As a side note, the planner is very bad at understanding a query that
happens (I realize you wouldn't write this by hand, but ORMs) when you have
a where clause like:
s.status IN ('A', 'PD', 'P') AND s.status IN ('A', 'P')
the row estimates are significantly different from a where clause with only:
s.status IN ('A', 'P')
even though semantically those are identical.
 QUERY PLAN 


 Nested Loop  (cost=0.42..8795.84 rows=11004 width=8) (actual 
time=0.020..12.338 rows=8919 loops=1)
   Join Filter: (s.m_fk = m.pk)
   Buffers: shared hit=8234
   ->  Seq Scan on m  (cost=0.00..1.01 rows=1 width=8) (actual 
time=0.004..0.005 rows=1 loops=1)
 Filter: (status = 'A'::text)
 Buffers: shared hit=1
   ->  Index Scan using s_action_at_pk on s  (cost=0.42..8657.28 rows=11004 
width=16) (actual time=0.014..11.008 rows=8919 loops=1)
 Index Cond: (action_at <= '2018-09-06'::date)
 Filter: (status = ANY ('{A,P}'::text[]))
 Rows Removed by Filter: 4483
 Buffers: shared hit=8233
 Planning Time: 0.502 ms
 Execution Time: 12.793 ms
(13 rows)
   QUERY PLAN   


 Nested Loop  (cost=1900.77..6768.34 rows=11004 width=8) (actual 
time=1.345..11.812 rows=8919 loops=1)
   Join Filter: (s.m_fk = m.pk)
   Buffers: shared hit=3184
   ->  Seq Scan on m  (cost=0.00..1.01 rows=1 width=8) (actual 
time=0.005..0.006 rows=1 loops=1)
 Filter: (status = 'A'::text)
 Buffers: shared hit=1
   ->  Bitmap Heap Scan on s  (cost=1900.77..6629.78 rows=11004 width=16) 
(actual time=1.337..10.515 rows=8919 loops=1)
 Recheck Cond: ((action_at <= '2018-09-06'::date) AND (status = ANY 
('{P,PD,A}'::text[])))
 Filter: (status = ANY ('{A,P}'::text[]))
 Rows Removed by Filter: 4483
 Heap Blocks: exact=3129
 Buffers: shared hit=3183
 ->  Bitmap Index Scan on s_action_at_pk  (cost=0.00..1898.02 
rows=82347 width=0) (actual time=0.995..0.995 rows=13402 loops=1)
   Index Cond: (action_at <= '2018-09-06'::date)
   Buffers: shared hit=54
 Planning Time: 0.719 ms
 Execution Time: 12.288 ms
(17 rows)
 QUERY PLAN 


Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Patrick Molgaard
Hi Jeff,

Thanks for your reply. Are locks relevant in this case, though?

To be clear, the slow statements are the first thing happening on the
connection and don't look like they should be acquiring any kind of lock -
eg. 'select version();' also seems to be paused when it occurs.

Or are there some system level locks that a trivial query, touching no
relations, might be contending for?

Best
Patrick

On Fri, 7 Sep 2018, 15:32 Jeff Janes,  wrote:

> On Fri, Sep 7, 2018 at 8:00 AM Patrick Molgaard 
> wrote:
>
>> Hi folks,
>>
>> I've been seeing some curious behaviour on a postgres server I administer.
>>
>> Intermittently (one or two times a week), all queries on that host are
>> simultaneously blocked for extended periods (10s of seconds).
>>
>> The blocked queries are trivial & not related to locking - I'm seeing
>> slowlogs of the form:
>>
>> `LOG: duration: 22627.299 ms statement: SET client_encoding='''utf-8''';`
>>
>>
> Do you have log_lock_waits set to on?  If not, you might want to turn it
> on.
>
> Cheers,
>
> Jeff
>


Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Andreas Kretschmer


>
>Intermittently (one or two times a week), all queries on that host are
>simultaneously blocked for extended periods (10s of seconds).
>
>The blocked queries are trivial & not related to locking - I'm seeing
>slowlogs of the form:
>


please check if THP are enabled.


Regards, Andreas



-- 
2ndQuadrant - The PostgreSQL Support Company



Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Joshua D. Drake

On 09/07/2018 11:12 AM, Andreas Kretschmer wrote:

Intermittently (one or two times a week), all queries on that host are
simultaneously blocked for extended periods (10s of seconds).

The blocked queries are trivial & not related to locking - I'm seeing
slowlogs of the form:



please check if THP are enabled.


Just to help out those who don't know I believe that Andreas is 
referring to Transparent Huge Pages.


JD




Regards, Andreas





--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc
***  A fault and talent of mine is to tell it exactly how it is.  ***
PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Jeff Janes
On Fri, Sep 7, 2018 at 2:03 PM Patrick Molgaard  wrote:

>
> Hi Jeff,
>
> Thanks for your reply. Are locks relevant in this case, though?
>

I don't know, but why theorize when we can know for sure?  It at least
invokes VirtualXactLockTableInsert.  I don't see how that could block on a
heavyweight lock, though. But again, why theorize when logging it is simple?

Is it always the first statement in a connection which is blocking, or will
established connections also block at the same time the new ones start to
block?

Cheers,

Jeff

>


GIN Index has O(N^2) complexity for array overlap operator?

2018-09-07 Thread Felix Geisendörfer
Hi everybody,

I ran into an issue with using the && array operator on a GIN index of mine. 
Basically I have a query that looks like this:

SELECT * FROM example WHERE keys && ARRAY[...];

This works fine for a small number of array elements (N), but gets really slow 
as N gets bigger in what appears to be O(N^2) complexity.

However, from studying the GIN data structure as described by the docs, it 
seems that the performance for this could be O(N). In fact, it's possible to 
coerce the query planner into an O(N) plan like this:

SELECT DISTINCT ON (example.id) * FROM unnest(ARRAY[...]) key JOIN example ON 
keys && ARRAY[key]

In order to illustrate this better, I've created a jupyter notebook that 
populates an example table, show the query plans for both queries, and most 
importantly benchmarks them and plots a time vs array size (N) graph.

https://github.com/felixge/pg-slow-gin/blob/master/pg-slow-gin.ipynb 


Please help me understand what causes the O(N^2) performance for query 1 and if 
query 2 is the best way to work around this issue.

Thanks
Felix Geisendörfer

PS: I'm using Postgres 10, but also verified that this problem exists with 
Postgres 11.

Performance of INSERT into temporary tables using psqlODBC driver

2018-09-07 Thread padusuma
I am working on adding support for PostgreSQL database for our application.
In a lot of our use-cases, data is inserted into temporary tables using
INSERT INTO statements with bind parameters, and subsequently queries are
run by joining to these temp tables. Following is some of the data for these
INSERT statements:

Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
COMMIT DELETE ROWS;

SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);

Time taken to insert 24428 rows: 10.077 sec
Time taken to insert 32512 rows: 16.026 sec
Time taken to insert 32512 rows: 15.821 sec
Time taken to insert  6107 rows: 1.514 sec

I am looking for suggestions to improve the performance of these INSERT
statements into temporary tables. Database is located on a Linux VM and the
version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
windows platform and connecting to the database using psqlODBC driver
version 10.03.

Please let me know if any additional information is needed.



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: Performance of INSERT into temporary tables using psqlODBC driver

2018-09-07 Thread Tim Cross


padusuma  writes:

> I am working on adding support for PostgreSQL database for our application.
> In a lot of our use-cases, data is inserted into temporary tables using
> INSERT INTO statements with bind parameters, and subsequently queries are
> run by joining to these temp tables. Following is some of the data for these
> INSERT statements:
>
> Table definition: CREATE TEMPORARY TABLE Table1( auid varchar(15) ) ON
> COMMIT DELETE ROWS;
>
> SQL statement: INSERT INTO Table1 (uidcol) VALUES (:1);
>
> Time taken to insert 24428 rows: 10.077 sec
> Time taken to insert 32512 rows: 16.026 sec
> Time taken to insert 32512 rows: 15.821 sec
> Time taken to insert  6107 rows: 1.514 sec
>
> I am looking for suggestions to improve the performance of these INSERT
> statements into temporary tables. Database is located on a Linux VM and the
> version is "PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.4.7 20120313 (Red Hat 4.4.7-18), 64-bit". The application is running on a
> windows platform and connecting to the database using psqlODBC driver
> version 10.03.
>

We are inserting large numbers (millions) of rows into a postgres
database from a Javascript application and found using the COPY command
was much, much faster than doing regular inserts (even with multi-insert
commit). If you can do this using the driver you are using, that will
give you the largest performance boost. 


-- 
Tim Cross



Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Patrick Molgaard
Oh, to be clear - I'll be implementing your suggestion regardless, it seems
valuable whether or not it gets me closer to the root cause this time :)

I was just trying to dig into why it may be relevant -- I want to really
get a good grip on the mechanism behind this phenomenon.

Cheers
Patrick
On Fri, 7 Sep 2018, 20:20 Jeff Janes,  wrote:

> On Fri, Sep 7, 2018 at 2:03 PM Patrick Molgaard 
> wrote:
>
>>
>> Hi Jeff,
>>
>> Thanks for your reply. Are locks relevant in this case, though?
>>
>
> I don't know, but why theorize when we can know for sure?  It at least
> invokes VirtualXactLockTableInsert.  I don't see how that could block on a
> heavyweight lock, though. But again, why theorize when logging it is simple?
>
> Is it always the first statement in a connection which is blocking, or
> will established connections also block at the same time the new ones start
> to block?
>
> Cheers,
>
> Jeff
>
>>


Re: Multi-second pauses blocking even trivial activity

2018-09-07 Thread Patrick Molgaard
This sounds extremely plausible -- thanks for the tip, Andreas.

Best,
Patrick

On Fri, 7 Sep 2018, 19:20 Andreas Kretschmer, 
wrote:

>
> >
> >Intermittently (one or two times a week), all queries on that host are
> >simultaneously blocked for extended periods (10s of seconds).
> >
> >The blocked queries are trivial & not related to locking - I'm seeing
> >slowlogs of the form:
> >
>
>
> please check if THP are enabled.
>
>
> Regards, Andreas
>
>
>
> --
> 2ndQuadrant - The PostgreSQL Support Company
>
>