Upsert performance considerations (~1 mil/hour)

2019-09-04 Thread Fredrik Blomqvist
Hi,

I have tried doing some research for quite a while on the performance
implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when
a lot of upserts are made. The scale is something like 1 million
records/hour, that is split up in groups of around 300 records each.

The table is fairly simple, just a few int/bool fields and a date field.
The total size of the table is somewhere between 50-60 million records.
Essentially, all the rows are supposed to stay up-to-date within a certain
cycle (that is currently about 2 days). Sometimes a lot of information
changes, sometimes very little/no information changes (e.g. 300 records get
upserted but they are identical to the existing records).

So far, one hypothesis is that this project seems to be suffering from the
large amount of writes that happen constantly since even if the upsert
results in no inserts/updates, the "failed" inserts from the upsert will
still get written somewhere (according to our knowledge). Therefore, the
idea is to utilize old-fashioned upserts (writeable CTEs) and do more
granular operations that can make sure to only insert data that doesn't
already exist, and only update data that has actually changed. Naturally,
however, this will put more read-load on the DB and increase query
complexity.

The question is, are we right in our assumptions that the built-in upsert
is useless from a performance perspective (e.g. it's only good for much
smaller tasks) or are we wrong? I read a bit about HOT updates and
autovacuum tuning, but nothing that references something more similar to
this question.

Worth mentioning is that this DB (PostgreSQL 10.9) is running on Heroku so
we are not able to tune it to our needs. We are planning to move at some
point, depending on how important it ends up being. Finally, it is also
worth mentioning that this DB also has one follower (i.e. read replica).

Would really appreciate some good insight on this! Thanks beforehand.

Best,
Fredrik


Re: Upsert performance considerations (~1 mil/hour)

2019-09-04 Thread Fredrik Blomqvist
Thanks for the response Jeff!

On Wed, Sep 4, 2019 at 3:58 PM Jeff Janes  wrote:

> On Wed, Sep 4, 2019 at 1:30 PM Fredrik Blomqvist <
> [email protected]> wrote:
>
>> Hi,
>>
>> I have tried doing some research for quite a while on the performance
>> implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when
>> a lot of upserts are made. The scale is something like 1 million
>> records/hour, that is split up in groups of around 300 records each.
>>
>
> How is that done?  300 single-valued insert statements, grouped into on
> transaction?  one 300-valued insert statement?
>

It's done using one 300-valued insert.


>
>
>> So far, one hypothesis is that this project seems to be suffering from
>> the large amount of writes that happen constantly since even if the upsert
>> results in no inserts/updates, the "failed" inserts from the upsert will
>> still get written somewhere (according to our knowledge).
>>
>
> You can suppress redundant updates with a trigger, as described
> https://www.postgresql.org/docs/current/functions-trigger.html.  This
> works even for updates that are the result of insert..on conflict..update.
> There is still some writing, as each tuple does get locked, but it is much
> less (at least from a WAL perspective).   You can also put  a WHERE clause
> on the DO UPDATE so it only updates is a field has changed, but you have to
> list each field connected with OR.
>

Didn't know about the trigger method, handy. We were planning on utilizing
the WHERE clause to prevent unnecessary updates, so I suppose that will
make the situation slightly better. However, we are still left with the
unnecessary insert, right? If all 300 values already exist and are up to
date, there will be a failed insert that will have to be vacuumed, right?
Which in turn means that we'd probably need to tune the auto vacuuming to a
more aggressive setting if we want to use this kind of upsert.


> Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs)
>> and do more granular operations that can make sure to only insert data that
>> doesn't already exist, and only update data that has actually changed.
>> Naturally, however, this will put more read-load on the DB and increase
>> query complexity.
>>
>
> It shouldn't put a meaningful additional read load on the database, as the
> ON CONFLICT code still needs to do the read as well.  Yes, it makes the
> code slightly more complex.
>

Right, okay. Based on what I have told you so far, would you recommend
going with the old-fashioned upsert or the built-in one? Or is there some
other key information that could swing that decision?

Best,
Fredrik

>


Re: Upsert performance considerations (~1 mil/hour)

2019-09-04 Thread Jeff Janes
On Wed, Sep 4, 2019 at 1:30 PM Fredrik Blomqvist <
[email protected]> wrote:

> Hi,
>
> I have tried doing some research for quite a while on the performance
> implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when
> a lot of upserts are made. The scale is something like 1 million
> records/hour, that is split up in groups of around 300 records each.
>

How is that done?  300 single-valued insert statements, grouped into on
transaction?  one 300-valued insert statement?


> So far, one hypothesis is that this project seems to be suffering from the
> large amount of writes that happen constantly since even if the upsert
> results in no inserts/updates, the "failed" inserts from the upsert will
> still get written somewhere (according to our knowledge).
>

You can suppress redundant updates with a trigger, as described
https://www.postgresql.org/docs/current/functions-trigger.html.  This works
even for updates that are the result of insert..on conflict..update.  There
is still some writing, as each tuple does get locked, but it is much less
(at least from a WAL perspective).   You can also put  a WHERE clause on
the DO UPDATE so it only updates is a field has changed, but you have to
list each field connected with OR.


> Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs)
> and do more granular operations that can make sure to only insert data that
> doesn't already exist, and only update data that has actually changed.
> Naturally, however, this will put more read-load on the DB and increase
> query complexity.
>

It shouldn't put a meaningful additional read load on the database, as the
ON CONFLICT code still needs to do the read as well.  Yes, it makes the
code slightly more complex.

Cheers,

Jeff

>


Re: Incorrect choice of Nested Loop for a skewed distribution

2019-09-04 Thread Justin Pryzby
On Tue, Sep 03, 2019 at 09:47:42PM +0400, Oleg Kharin wrote:
> Hi All,
> 
> With PostgreSQL 10 and 11, the planner doesn't use the lists of most common
> values to determine the selectivity of "=" for Nested Loop as it does for a
> normal inner join in eqjoinsel_inner(). Incorrect choice of a nested loops
> join strategy causes poor query performance.
> To demonstrate it one can make the following test case:
...
> The columns f and g of both tables t and u have a skewed distribution: 10010
> values of 0 and 10010 unique values starting from 1.
> Let's see query plan for the join of t and u:
> 
>   explain analyze select * from t,u where t.f=u.f and t.g=u.g;
>  Nested Loop  (cost=0.29..7629.95 rows=25055030 width=16) (actual 
> time=0.042..22540.629 rows=20020 loops=1)
>    ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual 
> time=0.011..3.025 rows=20020 loops=1)
>    ->  Index Scan using t_f on t  (cost=0.29..0.36 rows=1 width=8) (actual 
> time=0.565..1.125 rows=1 loops=20020)
>  Index Cond: (f = u.f)
>  Filter: (u.g = g)
>  Rows Removed by Filter: 5004
>  Planning Time: 0.394 ms
>  Execution Time: 22542.639 ms
> 
> After dropping the index
>   drop index t_f;
> we obtain much better query plan (without Nested Loop):
>  Merge Join  (cost=3439.09..442052.26 rows=25055030 width=16) (actual 
> time=15.708..32.735 rows=20020 loops=1)
>    Merge Cond: ((t.f = u.f) AND (t.g = u.g))
>    ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual 
> time=8.189..10.189 rows=20020 loops=1)
>  Sort Key: t.f, t.g
>  Sort Method: quicksort  Memory: 1707kB
>  ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) (actual 
> time=0.012..2.958 rows=20020 loops=1)
>    ->  Sort  (cost=1719.54..1769.59 rows=20020 width=8) (actual 
> time=7.510..9.459 rows=20020 loops=1)
>  Sort Key: u.f, u.g
>  Sort Method: quicksort  Memory: 1707kB
>  ->  Seq Scan on u  (cost=0.00..289.20 rows=20020 width=8) (actual 
> time=0.008..2.748 rows=20020 loops=1)
>  Planning Time: 0.239 ms
>  Execution Time: 34.585 ms

>  Nested Loop  (cost=0.29.. 7629.95 rows=25055030 width=16) (actual...
>  Merge Join   (cost=3439.09..442052.26 rows=25055030 width=16) (actual...

When you dropped the index, you necessarily refused the plan involving index 
scan.
So it did merge join instead (which it thinks of as expensive because it has to
sort both sides).

As you saw, the rowcount estimate of the join result is badly off.  But that's
true of both plans.

Choice of join type is affected by the size of its *inputs*, but doesn't and
shouldn't have any effect on the result rowcount (or its) estimate.  The
rowcount *output* of the join would only affect any *following* plan nodes (of
which there are none in this case).

You suggested using the MCV list, but I don't think that's possible, since the
nested loop is evaluating its "inner" table multiple times, in a "loop":

>    ->  Index Scan using t_f on t  (cost=0.29..0.36 rows=1 width=8) (actual 
> time=0.565..1.125 rows=1 loops=20020)

Hypothetically, one could plan the query 20k times, for each value of u.f and
u.g, but that's tantamount to actually executing the query, which is why it
uses (I gather) var_eq_non_const.

If you enable BUFFERS, you can see:

postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND 
t.f=u.f ;
 Nested Loop  (cost=0.29..7629.95 rows=25055030 width=16) (actual 
time=0.031..22634.482 rows=20020 loops=1)
   Buffers: shared hit=770913
   ->  Seq Scan on t  (cost=0.00..289.20 rows=20020 width=8) (actual 
time=0.011..22.883 rows=20020 loops=1)
 Buffers: shared hit=89
   ->  Index Scan using u_f_idx on u  (cost=0.29..0.36 rows=1 width=8) (actual 
time=0.596..1.125 rows=1 loops=20020)
 ...
 Buffers: shared hit=770824

vs.

postgres=# SET enable_nestloop=off;
postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND 
t.f=u.f ;
 Merge Join  (cost=3439.09..442052.26 rows=25055030 width=16) (actual 
time=74.262..187.454 rows=20020 loops=1)
   Merge Cond: ((t.g = u.g) AND (t.f = u.f))
   Buffers: shared hit=178
   ...

So the nest loop plan is hitting 770k buffer pages (5GB) while looping 20k
times, rather than 178 buffers when each page is read once (to sort).

Perhaps you could get a good plan by playing with these, but it's unclear why
that's necessary.

 effective_cache_size | 1280
 cpu_index_tuple_cost | 0.005
 cpu_operator_cost| 0.0025
 cpu_tuple_cost   | 0.01
 random_page_cost | 4
 seq_page_cost| 1

Also, since PG96, FKs can improve join estimates:

postgres=# CREATE UNIQUE INDEX ON u(f,g);
postgres=# CREATE UNIQUE INDEX ON t(f,g);
postgres=# ALTER TABLE t ADD CONSTRAINT fk_f FOREIGN KEY (f,g) REFERENCES 
u(f,g);
postgres=# ALTER TABLE u ADD CONSTRAINT fk_u FOREIGN KEY (f,g) REFERENCES 
t(f,g);
postgres=# explain analyze select * from t,u WHERE t.g=u.g AND t.f=u.f ;