Queue table that quickly grows causes query planner to choose poor plan

2018-06-27 Thread David Wheeler
Hi All,

I’m having performance trouble with a particular set of queries. It goes a bit 
like this

1) queue table is initially empty, and very narrow (1 bigint column)
2) we insert ~30 million rows into queue table
3) we do a join with queue table to delete from another table (delete from a 
using queue where a.id<http://a.id> = queue.id<http://queue.id>), but postgres 
stats say that queue table is empty, so it uses a nested loop over all 30 
million rows, taking forever

If I kill the query in 3 and let it run again after autoanalyze has done it’s 
thing then it is very quick

This queue table is empty 99% of the time, and the query in 3 runs immediately 
after step 2. Is there any likelyhood that tweaking the autoanalyze params 
would help in this case? I don’t want to explicitly analyze the table between 
steps 2 and three either as there are other patterns of use where for example 0 
rows are inserted in step 2 and this is expected to run very very quickly. Do I 
have any other options?

Postgres 9.5 ATM, but an upgrade is in planning.


Thanks in advance

David Wheeler
Software developer

[cid:2C4D0888-9F8B-463F-BD54-2B60A322210C]


E [email protected]<mailto:[email protected]>
D +61 3 9663 3554  W http://dgitsystems.com
Level 8, 620 Bourke St, Melbourne VIC 3000.




Re: Queue table that quickly grows causes query planner to choose poor plan

2018-06-27 Thread David Wheeler
Hi Tom,

Thanks for your reply, that’s very helpful and informative.

Although there's no way to have any useful pg_statistic stats if you won't
do an ANALYZE, the planner nonetheless can see the table's current
physical size, and what it normally does is to multiply the last-reported
tuple density (reltuples/relpages) by the current size.  So if you're
getting an "empty table" estimate anyway, I have to suppose that the
table's state involves reltuples = 0 and relpages > 0.  That's not a
good place to be in; it constrains the planner to believe that the table
is in fact devoid of tuples, because that's what the last ANALYZE saw.

That appears to be correct. I assumed that because the table was analyzed and 
found to be empty then the autovacuum would probably have cleared all the 
tuples too, but that’s not the case.

 relpages |  reltuples
--+-
0 | 2.33795e+06

I am not following your aversion to sticking an ANALYZE in there,
either.  It's not like inserting 30 million rows would be free.

There are many usage profiles for these tables. Sometimes there will be a 
single insert of 30 million rows, sometimes there will be several inserts of up 
to 100 million rows each in different threads, sometimes there will be many 
(~80 000) inserts of 0 rows (for which an ANALYSE is simply a waste) - I don’t 
want to cause undue performance penalty on the other usage profiles.

But as Justin rightly points out I can selectively ANALYSE only when > x rows 
are inserted, which I think is the best way forward.

David Wheeler
Software developer

[cid:2C4D0888-9F8B-463F-BD54-2B60A322210C]


E [email protected]<mailto:[email protected]>
D +61 3 9663 3554  W http://dgitsystems.com
Level 8, 620 Bourke St, Melbourne VIC 3000.


On 28 Jun 2018, at 4:27 am, Tom Lane 
mailto:[email protected]>> wrote:

David Wheeler mailto:[email protected]>> 
writes:
I'm having performance trouble with a particular set of queries. It goes a bit 
like this

1) queue table is initially empty, and very narrow (1 bigint column)
2) we insert ~30 million rows into queue table
3) we do a join with queue table to delete from another table (delete from a 
using queue where a.id<http://a.id><http://a.id> = 
queue.id<http://queue.id><http://queue.id>), but postgres stats say that queue 
table is empty, so it uses a nested loop over all 30 million rows, taking 
forever

Although there's no way to have any useful pg_statistic stats if you won't
do an ANALYZE, the planner nonetheless can see the table's current
physical size, and what it normally does is to multiply the last-reported
tuple density (reltuples/relpages) by the current size.  So if you're
getting an "empty table" estimate anyway, I have to suppose that the
table's state involves reltuples = 0 and relpages > 0.  That's not a
good place to be in; it constrains the planner to believe that the table
is in fact devoid of tuples, because that's what the last ANALYZE saw.

Now, the initial state for a freshly-created or freshly-truncated table
is *not* that.  It is reltuples = 0 and relpages = 0, representing an
undefined tuple density.  Given that, the planner will make some guess
about average tuple size --- which is likely to be a very good guess,
for a table with only fixed-width columns --- and then compute a rowcount
estimate using that plus the observed physical size.

So I think your problem comes from oscillating between really-empty
and not-at-all-empty, and not using an idiomatic way of going back
to the empty state.  Have you tried using TRUNCATE instead of DELETE?

This queue table is empty 99% of the time, and the query in 3 runs immediately 
after step 2. Is there any likelyhood that tweaking the autoanalyze params 
would help in this case? I don't want to explicitly analyze the table between 
steps 2 and three either as there are other patterns of use where for example 0 
rows are inserted in step 2 and this is expected to run very very quickly. Do I 
have any other options?

I am not following your aversion to sticking an ANALYZE in there,
either.  It's not like inserting 30 million rows would be free.

regards, tom lane




Slow planning, fast execution for particular 3-table query

2019-11-03 Thread David Wheeler
We’re having trouble working out why the planning time for this particular 
query is slow (~2.5s vs 0.9ms execution time). As you can see below, there are 
only 3 tables involved so it’s hard to imagine what decisions the planner has 
to make that take so long. After 5 runs the prepared-statement code kicks in 
and it becomes quick, but it’s quite infuriating for the first 5 runs given the 
execution is so much faster.

Are you able to give any tips what might be taking so long (and how we might 
improve it)?

We read elsewhere that someone had a “catalog stats file leak”, which I’m 
taking to mean a big pg_statistic table. Ours is 10mb, which doesn’t seem 
particularly large to me, but I don’t have much context for it. 
https://www.postgresql.org/message-id/CABWW-d21z_WgawkjXFQQviqm16oAx0KQvR6bLkRxvYQmhdByfg%40mail.gmail.com

Other queries (with 3 or more tables) in the same db seem to be planning much 
quicker.

The query:

explain (analyse) SELECT subscription_binding.subscription_binding,
 subscription_binding.tid,
 subscription.subscription_uuid,
 subscription_binding.subscription_binding_uuid,
 binding.binding_uuid,
 subscription_binding.start_time,
 subscription_binding.end_time,
 subscription_binding.timezone,
 now() >= subscription_binding.start_time AND 
(subscription_binding.end_time IS NULL OR now() <= 
subscription_binding.end_time) AS active
FROM jackpot.binding
  JOIN jackpot.subscription_binding USING (tid, binding)
  JOIN jackpot.subscription USING (tid, subscription)
where (tid=2082003407) AND 
(binding_uuid='4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid) offset 0 limit 1000;

QUERY PLAN
--
Limit  (cost=1.29..25.38 rows=1 width=80) (actual time=0.770..0.771 rows=1 
loops=1)
   ->  Nested Loop  (cost=1.29..25.38 rows=1 width=80) (actual 
time=0.770..0.771 rows=1 loops=1)
 ->  Nested Loop  (cost=0.86..16.91 rows=1 width=76) (actual 
time=0.697..0.698 rows=1 loops=1)
   ->  Index Scan using binding_tid_binding_uuid_key on binding  
(cost=0.43..8.45 rows=1 width=28) (actual time=0.647..0.647 rows=1 loops=1)
 Index Cond: ((tid = 2082003407) AND (binding_uuid = 
'4f61dcd5-97a0-4098-b9ae-c1546c31b2e6'::uuid))
   ->  Index Scan using subscription_binding_idx on 
subscription_binding  (cost=0.43..8.45 rows=1 width=64) (actual 
time=0.045..0.046 rows=1 loops=1)
 Index Cond: ((tid = 2082003407) AND (binding = 
binding.binding))
 ->  Index Scan using subscription_pkey on subscription  
(cost=0.43..8.45 rows=1 width=28) (actual time=0.068..0.068 rows=1 loops=1)
   Index Cond: ((tid = 2082003407) AND (subscription = 
subscription_binding.subscription))
Planning time: 2429.682 ms
Execution time: 0.914 ms
(11 rows)

Postgres version 9.5.19

Each of the tables has between 3-4 indexes, and all the indexes include tid as 
first parameter. No partitions, no sign of a stray replication slot / 
uncommitted transaction / prepared transaction that may be holding up autovac, 
no sign of bloated indexes.

TIA!

Best regards,

David Wheeler
General Manager Bali Office
Bali T +62 361 475 2333   M +62 819 3660 9180
E [email protected]<mailto:[email protected]>
Jl. Pura Mertasari No. 7, Sunset Road Abian Base
Kuta, Badung – Bali 80361, Indonesia
http://www.dgitsystems.com<http://www.dgitsystems.com/>

[signature_1605257152][signature_1263573595]




Re: Slow planning, fast execution for particular 3-table query

2019-11-03 Thread David Wheeler
I'm not sure what "unusually large" is, but they're all < 1mb which is a little 
larger than some of our other comparable databases (mostly <300kb) but seems 
reasonable to me. 

Regards, 
 
David

On 4/11/19, 3:37 pm, "Laurenz Albe"  wrote:

    On Mon, 2019-11-04 at 03:04 +, David Wheeler wrote:
> We’re having trouble working out why the planning time for this 
particular query is slow
> (~2.5s vs 0.9ms execution time). As you can see below, there are only 3 
tables involved
> so it’s hard to imagine what decisions the planner has to make that take 
so long. After
> 5 runs the prepared-statement code kicks in and it becomes quick, but 
it’s quite
> infuriating for the first 5 runs given the execution is so much faster. 
> 
> Are you able to give any tips what might be taking so long (and how we 
might improve it)?
> 
[...]
> Planning time: 2429.682 ms
> 
> Execution time: 0.914 ms

Strange.
Are any of your catalog tables unusually large?

SELECT pg_relation_size(t.oid),
   t.relname
FROM pg_class AS t
   JOIN pg_namespace AS n ON t.relnamespace = n.oid
WHERE t.relkind = 'r'
ORDER BY pg_relation_size(t.oid) DESC
LIMIT 10;

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Slow planning, fast execution for particular 3-table query

2019-11-03 Thread David Wheeler
>To see this issue, you have to have recently
>inserted or deleted a bunch of extremal values of the indexed join-key
>column.  And the problem only persists until those values become known
>committed-good, or known dead-to-everybody.  (Maybe you've got a
>long-running transaction somewhere, postponing the dead-to-everybody
>condition?)

There are no long-running transactions that have backend_xmin set in 
pg_stat_activity, if that's what you mean here. There are also no open prepared 
transactions or replication slots which I understand have a similar 
keeping-things-alive issue. 

These tables are biggish (hundreds of mb), but not changing so frequently that 
I'd expect large quantities of data to be inserted or deleted before autovac 
can get in there and clean it up. And certainly not in a single uncommitted 
transaction. 

I'll try reindexing each of the tables just to make sure it's not strange index 
imbalance or something causing the issue. 

Regards, 
 
David

On 4/11/19, 4:01 pm, "Tom Lane"  wrote:

David Wheeler  writes:
> We’re having trouble working out why the planning time for this
> particular query is slow (~2.5s vs 0.9ms execution time). As you can see
> below, there are only 3 tables involved so it’s hard to imagine what
> decisions the planner has to make that take so long.

I wonder whether this traces to the cost of trying to estimate the
largest/smallest value of an indexed column by looking into the index.
Normally that's pretty cheap, but if you have a lot of recently-inserted
or recently-deleted values at the end of the index, it can get painful.
AFAIR this only happens for columns that are equijoin keys, so the fact
that your query is a join is significant.

I'm not convinced that this is the problem, because it's a corner case
that few people hit.  To see this issue, you have to have recently
inserted or deleted a bunch of extremal values of the indexed join-key
column.  And the problem only persists until those values become known
committed-good, or known dead-to-everybody.  (Maybe you've got a
long-running transaction somewhere, postponing the dead-to-everybody
condition?)

> Postgres version 9.5.19

If this *is* the cause, v11 and up have a performance improvement that
you need:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3

regards, tom lane




Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread David Wheeler
>> I'll try reindexing each of the tables just to make sure it's not strange 
>> index imbalance or something causing the issue.
> I seen this issue few time - and reindex helps.

Alas our reindex doesn’t seem to have helped. I’m going to see if we can 
reproduce this on a non-prod environment so we can muck about a bit more. If we 
can reproduce it in a safe place, is there a tool we can use to get more info 
out of the query planner to find what it’s doing to take so long?

Regards,

David

From: Pavel Stehule 
Date: Monday, 4 November 2019 at 4:53 pm
To: David Wheeler 
Cc: Tom Lane , "[email protected]" 
, Cameron Redpath 

Subject: Re: Slow planning, fast execution for particular 3-table query



po 4. 11. 2019 v 6:17 odesílatel David Wheeler 
mailto:[email protected]>> napsal:
>To see this issue, you have to have recently
>inserted or deleted a bunch of extremal values of the indexed join-key
>column.  And the problem only persists until those values become known
>committed-good, or known dead-to-everybody.  (Maybe you've got a
>long-running transaction somewhere, postponing the dead-to-everybody
>condition?)

There are no long-running transactions that have backend_xmin set in 
pg_stat_activity, if that's what you mean here. There are also no open prepared 
transactions or replication slots which I understand have a similar 
keeping-things-alive issue.

These tables are biggish (hundreds of mb), but not changing so frequently that 
I'd expect large quantities of data to be inserted or deleted before autovac 
can get in there and clean it up. And certainly not in a single uncommitted 
transaction.

I'll try reindexing each of the tables just to make sure it's not strange index 
imbalance or something causing the issue.

I seen this issue few time - and reindex helps.

Pavel


Regards,

David

On 4/11/19, 4:01 pm, "Tom Lane" mailto:[email protected]>> 
wrote:

David Wheeler mailto:[email protected]>> 
writes:
> We’re having trouble working out why the planning time for this
> particular query is slow (~2.5s vs 0.9ms execution time). As you can see
> below, there are only 3 tables involved so it’s hard to imagine what
> decisions the planner has to make that take so long.

I wonder whether this traces to the cost of trying to estimate the
largest/smallest value of an indexed column by looking into the index.
Normally that's pretty cheap, but if you have a lot of recently-inserted
or recently-deleted values at the end of the index, it can get painful.
AFAIR this only happens for columns that are equijoin keys, so the fact
that your query is a join is significant.

I'm not convinced that this is the problem, because it's a corner case
that few people hit.  To see this issue, you have to have recently
inserted or deleted a bunch of extremal values of the indexed join-key
column.  And the problem only persists until those values become known
committed-good, or known dead-to-everybody.  (Maybe you've got a
long-running transaction somewhere, postponing the dead-to-everybody
condition?)

> Postgres version 9.5.19

If this *is* the cause, v11 and up have a performance improvement that
you need:

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=3ca930fc3

regards, tom lane



Re: Slow planning, fast execution for particular 3-table query

2019-11-06 Thread David Wheeler
Is default_statistics_target set above default 100? I would assume that would 
reflect in the size of pg_statistic, but wanted to ask since increasing that 
from 100 to 1000 was the only time I have seen planning time explode. Are other 
queries slow to plan?

Looks like you’ve found it! Someone has set the target to 10k so that’s going 
to wildly increase planning time.

Thanks for your help! And thanks to the others who chipped in along the way 😊

Regards,

David



Re: Plan not skipping unnecessary inner join

2020-05-13 Thread David Wheeler
> Am I missing a situation where these joins could impact the result?

Yes it will impact the number of rows in the result. for example if foo is 
empty then postgres is required to return no results, regardless of how many 
rows are in bar. This is why it can ignore the table in the left join

— David

> On 14 May 2020, at 1:44 pm, Matthew Nelson  wrote:
> 
> I noticed something peculiar while optimizing complex views today. The query 
> planner does not skip inner joins that, to my understanding, can have no 
> impact on the result. Am I missing a situation where these joins could impact 
> the result?
> 
> The following demonstrates the problem without the complex views. It also 
> demonstrates how the planner simplifies a LEFT JOIN in the same situation. 
> The left and right sides of an inner join could be swapped, obviously, but 
> here I kept the unique constraint on the right.
> 
> 
> 
> CREATE TABLE foo (
>id INTEGER PRIMARY KEY
> );
> 
> CREATE TABLE bar (
>foo_id INTEGER NOT NULL REFERENCES foo
> );
> 
> -- This simplifies to SELECT COUNT(*) FROM bar;
> EXPLAIN SELECT COUNT(*)
> FROM bar
> LEFT JOIN foo ON bar.foo_id = foo.id;
> 
> -- This should simplify to SELECT COUNT(*) FROM bar WHERE foo_id IS NOT NULL;
> -- The presence of a NOT NULL constraint on foo_id has no effect.
> EXPLAIN SELECT COUNT(*)
> FROM bar
> INNER JOIN foo ON bar.foo_id = foo.id;
> 
> 
> 
> QUERY PLAN  
> -
> Aggregate  (cost=38.25..38.26 rows=1 width=8)
>   ->  Seq Scan on bar  (cost=0.00..32.60 rows=2260 width=0)
> (2 rows)
> 
>   QUERY PLAN
> -
> Aggregate  (cost=111.57..111.58 rows=1 width=8)
>   ->  Hash Join  (cost=67.38..105.92 rows=2260 width=0)
> Hash Cond: (bar.foo_id_not_null = foo.id)
> ->  Seq Scan on bar  (cost=0.00..32.60 rows=2260 width=4)
> ->  Hash  (cost=35.50..35.50 rows=2550 width=4)
>   ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)
> (6 rows)
> 
>  version  
> 
> ---
> PostgreSQL 12.2 on x86_64-apple-darwin19.4.0, compiled by Apple clang version 
> 11.0.3 (clang-1103.0.32.59), 64-bit
> (1 row)
> 
>