Re: Query Performance / Planner estimate off

2020-10-22 Thread Mats Olsen

On 10/22/20 8:37 AM, Justin Pryzby wrote:

On Wed, Oct 21, 2020 at 04:42:02PM +0200, Mats Olsen wrote:

On 10/21/20 2:38 PM, Sebastian Dressler wrote:

On 20. Oct 2020, at 11:37, Mats Julian Olsen https://explain.depesz.com/s/NvDR

2) enable_nestloop=off (4 min): https://explain.depesz.com/s/buKK

3) enable_nestloop=off; enable_seqscan=off (2 min):
https://explain.depesz.com/s/0WXx


How can I get Postgres not to loop over 12M rows?

I looked at the plans and your config and there are some thoughts I'm
having:

- The row estimate is off, as you possibly noticed. This can be possibly
solved by raising `default_statistics_target` to e.g. 2500 (we typically
use that) and run ANALYZE

I've `set default_statistics_target=2500` and ran analyze on both tables
involved, unfortunately the plan is the same. The columns we use for joining
here are hashes and we expect very few duplicates in the tables. Hence I
think extended statistics (storing most common values and histogram bounds)
aren't useful for this kind of data. Would you say the same thing?

In postgres, extended statistics means "MV stats objects", not MCV+histogram,
which are "simple statistics", like ndistinct.

Your indexes maybe aren't ideal for this query, as mentioned.
The indexes that do exist might also be inefficient, due to being unclustered,
or bloated, or due to multiple columns.


This table is append-only, i.e. no updates. The partitions are clustered 
on a btree index on block_time `


"transactions_p50_block_time_idx" btree (block_time) CLUSTER



These look redundant (which doesn't matter for this the query):

Partition key: RANGE (block_number)
Indexes:
 "transactions_block_number_btree" btree (block_number DESC)
 "transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree 
(block_number, hash)
 "transactions_block_number_time" btree (hash, block_number)

Maybe that would be an index just on "hash", which might help here.

Possibly you'd want to try to use a BRIN index on timestamp (or maybe
block_number?).


Yeah this could be a good idea, but the size of this table doesn't let 
me add any indexes while it's online. I'll revisit these the next time 
we redeploy the database.




Maybe you'd want to VACUUM the table to allow index-only scan on the hash
columns ?

Maybe you'd want to check if reindexing reduces the index size ?  We don't know
if the table gets lots of UPDATE/DELETE or if any of the columns have high
logical vs physical "correlation".
https://wiki.postgresql.org/wiki/Slow_Query_Questions#Statistics:_n_distinct.2C_MCV.2C_histogram

Have you ANALYZED the partitioned parent recently ?
This isn't handled by autovacuum.


As mentioned above there aren't any updates or deletes to this table. 
Both tables have been ANALYZEd. I ran that query and the output is here 
https://gist.github.com/mewwts/86ef43ff82120e104a654cd7fbb5ec06. I ran 
it for the two specific columns and all partitions for the transactions 
table, and for all columns on "Pair_evt_Mint". Does these values tell 
you anything?






Re: Query Performance / Planner estimate off

2020-10-22 Thread Justin Pryzby
On Thu, Oct 22, 2020 at 09:36:03AM +0200, Mats Olsen wrote:
> On 10/22/20 8:37 AM, Justin Pryzby wrote:
> > These look redundant (which doesn't matter for this the query):
> > 
> > Partition key: RANGE (block_number)
> > Indexes:
> >  "transactions_block_number_btree" btree (block_number DESC)
> >  "transactions_block_number_hash_key" UNIQUE CONSTRAINT, btree 
> > (block_number, hash)
> >  "transactions_block_number_time" btree (hash, block_number)
> > 
> > Maybe that would be an index just on "hash", which might help here.
> > 
> > Possibly you'd want to try to use a BRIN index on timestamp (or maybe
> > block_number?).
> 
> Yeah this could be a good idea, but the size of this table doesn't let me
> add any indexes while it's online. I'll revisit these the next time we
> redeploy the database.

Why not CREATE INDEX CONCURRENTLY ?
It seems to me you could add BRIN on all correlated indexes.  It's nearly free.

  0.102922715 | Pair_evt_Mint | evt_block_time   | f | 0 | 
-0.56466025 | 1 |  10001 |  0.964666
   0.06872191 | Pair_evt_Mint | evt_block_time   | f | 0 |  
-0.8379525 |   500 |501 |   0.99982
   0.06872191 | Pair_evt_Mint | evt_block_number | f | 0 |  
-0.8379525 |   500 |501 |   0.99982
  0.032878816 | Pair_evt_Mint | evt_block_number | f | 0 | 
-0.56466025 |  2500 |   2501 |  0.964666

> > Maybe you'd want to VACUUM the table to allow index-only scan on the hash
> > columns ?

Did you try it ? I think this could be a big win.
Since it's append-only, autovacuum won't hit it (until you upgrade to pg13).

-- 
Justin