date:20201022

Re: Query Performance / Planner estimate off

2020-10-22 Thread Mats Olsen

On 10/22/20 8:37 AM, Justin Pryzby wrote:

On Wed, Oct 21, 2020 at 04:42:02PM +0200, Mats Olsen wrote:

On 10/21/20 2:38 PM, Sebastian Dressler wrote:

On 20. Oct 2020, at 11:37, Mats Julian Olsen https://explain.depesz.com/s/NvDR

2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK

3) enable_nestloop=off; enable_seqscan=off (2 min):
https://explain.depesz.com/s/0WXx

How can I get Postgres not to loop over 12M rows?

I looked at the plans and your config and there are some thoughts I'm
having:

- The row estimate is off, as you possibly noticed. This can be possibly
solved by raising `default_statistics_target` to e.g. 2500 (we typically
use that) and run ANALYZE

I've `set default_statistics_target=2500` and ran analyze on both tables
involved, unfortunately the plan is the same. The columns we use for joining
here are hashes and we expect very few duplicates in the tables. Hence I
think extended statistics (storing most common values and histogram bounds)
aren't useful for this kind of data. Would you say the same thing?

In postgres, extended statistics means "MV stats objects", not MCV+histogram,
which are "simple statistics", like ndistinct.

Your indexes maybe aren't ideal for this query, as mentioned.
The indexes that do exist might also be inefficient, due to being unclustered,
or bloated, or due to multiple columns.

This table is append-only, i.e. no updates. The partitions are clustered
on a btree index on block_time `

"transactions_p50_block_time_idx" btree (block_time) CLUSTER

These look redundant (which doesn't matter for this the query):

Partition key: RANGE (block_number)
Indexes:
"transactions_block_number_btree" btree (block_number DESC)
"transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree
(block_number, hash)
"transactions_block_number_time" btree (hash, block_number)

Maybe that would be an index just on "hash", which might help here.

Possibly you'd want to try to use a BRIN index on timestamp (or maybe
block_number?).

Yeah this could be a good idea, but the size of this table doesn't let
me add any indexes while it's online. I'll revisit these the next time
we redeploy the database.

Maybe you'd want to VACUUM the table to allow index-only scan on the hash
columns ?

Maybe you'd want to check if reindexing reduces the index size ? We don't know
if the table gets lots of UPDATE/DELETE or if any of the columns have high
logical vs physical "correlation".
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

Have you ANALYZED the partitioned parent recently ?
This isn't handled by autovacuum.

As mentioned above there aren't any updates or deletes to this table.
Both tables have been ANALYZEd. I ran that query and the output is here
https://gist.github.com/mewwts/86ef43ff82120e104a654cd7fbb5ec06. I ran
it for the two specific columns and all partitions for the transactions
table, and for all columns on "Pair_evt_Mint". Does these values tell
you anything?

Re: Query Performance / Planner estimate off

2020-10-22 Thread Justin Pryzby

On Thu, Oct 22, 2020 at 09:36:03AM +0200, Mats Olsen wrote:
> On 10/22/20 8:37 AM, Justin Pryzby wrote:
> > These look redundant (which doesn't matter for this the query):
> > 
> > Partition key: RANGE (block_number)
> > Indexes:
> >  "transactions_block_number_btree" btree (block_number DESC)
> >  "transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree 
> > (block_number, hash)
> >  "transactions_block_number_time" btree (hash, block_number)
> > 
> > Maybe that would be an index just on "hash", which might help here.
> > 
> > Possibly you'd want to try to use a BRIN index on timestamp (or maybe
> > block_number?).
> 
> Yeah this could be a good idea, but the size of this table doesn't let me
> add any indexes while it's online. I'll revisit these the next time we
> redeploy the database.

Why not CREATE INDEX CONCURRENTLY ?
It seems to me you could add BRIN on all correlated indexes.  It's nearly free.

  0.102922715 | Pair_evt_Mint | evt_block_time   | f | 0 | 
-0.56466025 | 1 |  10001 |  0.964666
   0.06872191 | Pair_evt_Mint | evt_block_time   | f | 0 |  
-0.8379525 |   500 |501 |   0.99982
   0.06872191 | Pair_evt_Mint | evt_block_number | f | 0 |  
-0.8379525 |   500 |501 |   0.99982
  0.032878816 | Pair_evt_Mint | evt_block_number | f | 0 | 
-0.56466025 |  2500 |   2501 |  0.964666

> > Maybe you'd want to VACUUM the table to allow index-only scan on the hash
> > columns ?

Did you try it ? I think this could be a big win.
Since it's append-only, autovacuum won't hit it (until you upgrade to pg13).

-- 
Justin

Re: Query Performance / Planner estimate off

Re: Query Performance / Planner estimate off

2 matches

Site Navigation

Mail list logo

Footer information