Upsert performance considerations (~1 mil/hour)
Hi, I have tried doing some research for quite a while on the performance implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when a lot of upserts are made. The scale is something like 1 million records/hour, that is split up in groups of around 300 records each. The table is fairly simple, just a few int/bool fields and a date field. The total size of the table is somewhere between 50-60 million records. Essentially, all the rows are supposed to stay up-to-date within a certain cycle (that is currently about 2 days). Sometimes a lot of information changes, sometimes very little/no information changes (e.g. 300 records get upserted but they are identical to the existing records). So far, one hypothesis is that this project seems to be suffering from the large amount of writes that happen constantly since even if the upsert results in no inserts/updates, the "failed" inserts from the upsert will still get written somewhere (according to our knowledge). Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs) and do more granular operations that can make sure to only insert data that doesn't already exist, and only update data that has actually changed. Naturally, however, this will put more read-load on the DB and increase query complexity. The question is, are we right in our assumptions that the built-in upsert is useless from a performance perspective (e.g. it's only good for much smaller tasks) or are we wrong? I read a bit about HOT updates and autovacuum tuning, but nothing that references something more similar to this question. Worth mentioning is that this DB (PostgreSQL 10.9) is running on Heroku so we are not able to tune it to our needs. We are planning to move at some point, depending on how important it ends up being. Finally, it is also worth mentioning that this DB also has one follower (i.e. read replica). Would really appreciate some good insight on this! Thanks beforehand. Best, Fredrik
Re: Upsert performance considerations (~1 mil/hour)
Thanks for the response Jeff! On Wed, Sep 4, 2019 at 3:58 PM Jeff Janes wrote: > On Wed, Sep 4, 2019 at 1:30 PM Fredrik Blomqvist < > [email protected]> wrote: > >> Hi, >> >> I have tried doing some research for quite a while on the performance >> implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when >> a lot of upserts are made. The scale is something like 1 million >> records/hour, that is split up in groups of around 300 records each. >> > > How is that done? 300 single-valued insert statements, grouped into on > transaction? one 300-valued insert statement? > It's done using one 300-valued insert. > > >> So far, one hypothesis is that this project seems to be suffering from >> the large amount of writes that happen constantly since even if the upsert >> results in no inserts/updates, the "failed" inserts from the upsert will >> still get written somewhere (according to our knowledge). >> > > You can suppress redundant updates with a trigger, as described > https://www.postgresql.org/docs/current/functions-trigger.html. This > works even for updates that are the result of insert..on conflict..update. > There is still some writing, as each tuple does get locked, but it is much > less (at least from a WAL perspective). You can also put a WHERE clause > on the DO UPDATE so it only updates is a field has changed, but you have to > list each field connected with OR. > Didn't know about the trigger method, handy. We were planning on utilizing the WHERE clause to prevent unnecessary updates, so I suppose that will make the situation slightly better. However, we are still left with the unnecessary insert, right? If all 300 values already exist and are up to date, there will be a failed insert that will have to be vacuumed, right? Which in turn means that we'd probably need to tune the auto vacuuming to a more aggressive setting if we want to use this kind of upsert. > Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs) >> and do more granular operations that can make sure to only insert data that >> doesn't already exist, and only update data that has actually changed. >> Naturally, however, this will put more read-load on the DB and increase >> query complexity. >> > > It shouldn't put a meaningful additional read load on the database, as the > ON CONFLICT code still needs to do the read as well. Yes, it makes the > code slightly more complex. > Right, okay. Based on what I have told you so far, would you recommend going with the old-fashioned upsert or the built-in one? Or is there some other key information that could swing that decision? Best, Fredrik >
Re: Upsert performance considerations (~1 mil/hour)
On Wed, Sep 4, 2019 at 1:30 PM Fredrik Blomqvist < [email protected]> wrote: > Hi, > > I have tried doing some research for quite a while on the performance > implications of the built-in upsert (INSERT ... ON CONFLICT UPDATE...) when > a lot of upserts are made. The scale is something like 1 million > records/hour, that is split up in groups of around 300 records each. > How is that done? 300 single-valued insert statements, grouped into on transaction? one 300-valued insert statement? > So far, one hypothesis is that this project seems to be suffering from the > large amount of writes that happen constantly since even if the upsert > results in no inserts/updates, the "failed" inserts from the upsert will > still get written somewhere (according to our knowledge). > You can suppress redundant updates with a trigger, as described https://www.postgresql.org/docs/current/functions-trigger.html. This works even for updates that are the result of insert..on conflict..update. There is still some writing, as each tuple does get locked, but it is much less (at least from a WAL perspective). You can also put a WHERE clause on the DO UPDATE so it only updates is a field has changed, but you have to list each field connected with OR. > Therefore, the idea is to utilize old-fashioned upserts (writeable CTEs) > and do more granular operations that can make sure to only insert data that > doesn't already exist, and only update data that has actually changed. > Naturally, however, this will put more read-load on the DB and increase > query complexity. > It shouldn't put a meaningful additional read load on the database, as the ON CONFLICT code still needs to do the read as well. Yes, it makes the code slightly more complex. Cheers, Jeff >
Re: Incorrect choice of Nested Loop for a skewed distribution
On Tue, Sep 03, 2019 at 09:47:42PM +0400, Oleg Kharin wrote: > Hi All, > > With PostgreSQL 10 and 11, the planner doesn't use the lists of most common > values to determine the selectivity of "=" for Nested Loop as it does for a > normal inner join in eqjoinsel_inner(). Incorrect choice of a nested loops > join strategy causes poor query performance. > To demonstrate it one can make the following test case: ... > The columns f and g of both tables t and u have a skewed distribution: 10010 > values of 0 and 10010 unique values starting from 1. > Let's see query plan for the join of t and u: > > explain analyze select * from t,u where t.f=u.f and t.g=u.g; > Nested Loop (cost=0.29..7629.95 rows=25055030 width=16) (actual > time=0.042..22540.629 rows=20020 loops=1) > -> Seq Scan on u (cost=0.00..289.20 rows=20020 width=8) (actual > time=0.011..3.025 rows=20020 loops=1) > -> Index Scan using t_f on t (cost=0.29..0.36 rows=1 width=8) (actual > time=0.565..1.125 rows=1 loops=20020) > Index Cond: (f = u.f) > Filter: (u.g = g) > Rows Removed by Filter: 5004 > Planning Time: 0.394 ms > Execution Time: 22542.639 ms > > After dropping the index > drop index t_f; > we obtain much better query plan (without Nested Loop): > Merge Join (cost=3439.09..442052.26 rows=25055030 width=16) (actual > time=15.708..32.735 rows=20020 loops=1) > Merge Cond: ((t.f = u.f) AND (t.g = u.g)) > -> Sort (cost=1719.54..1769.59 rows=20020 width=8) (actual > time=8.189..10.189 rows=20020 loops=1) > Sort Key: t.f, t.g > Sort Method: quicksort Memory: 1707kB > -> Seq Scan on t (cost=0.00..289.20 rows=20020 width=8) (actual > time=0.012..2.958 rows=20020 loops=1) > -> Sort (cost=1719.54..1769.59 rows=20020 width=8) (actual > time=7.510..9.459 rows=20020 loops=1) > Sort Key: u.f, u.g > Sort Method: quicksort Memory: 1707kB > -> Seq Scan on u (cost=0.00..289.20 rows=20020 width=8) (actual > time=0.008..2.748 rows=20020 loops=1) > Planning Time: 0.239 ms > Execution Time: 34.585 ms > Nested Loop (cost=0.29.. 7629.95 rows=25055030 width=16) (actual... > Merge Join (cost=3439.09..442052.26 rows=25055030 width=16) (actual... When you dropped the index, you necessarily refused the plan involving index scan. So it did merge join instead (which it thinks of as expensive because it has to sort both sides). As you saw, the rowcount estimate of the join result is badly off. But that's true of both plans. Choice of join type is affected by the size of its *inputs*, but doesn't and shouldn't have any effect on the result rowcount (or its) estimate. The rowcount *output* of the join would only affect any *following* plan nodes (of which there are none in this case). You suggested using the MCV list, but I don't think that's possible, since the nested loop is evaluating its "inner" table multiple times, in a "loop": > -> Index Scan using t_f on t (cost=0.29..0.36 rows=1 width=8) (actual > time=0.565..1.125 rows=1 loops=20020) Hypothetically, one could plan the query 20k times, for each value of u.f and u.g, but that's tantamount to actually executing the query, which is why it uses (I gather) var_eq_non_const. If you enable BUFFERS, you can see: postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND t.f=u.f ; Nested Loop (cost=0.29..7629.95 rows=25055030 width=16) (actual time=0.031..22634.482 rows=20020 loops=1) Buffers: shared hit=770913 -> Seq Scan on t (cost=0.00..289.20 rows=20020 width=8) (actual time=0.011..22.883 rows=20020 loops=1) Buffers: shared hit=89 -> Index Scan using u_f_idx on u (cost=0.29..0.36 rows=1 width=8) (actual time=0.596..1.125 rows=1 loops=20020) ... Buffers: shared hit=770824 vs. postgres=# SET enable_nestloop=off; postgres=# explain (analyze on,buffers) select * from t,u WHERE t.g=u.g AND t.f=u.f ; Merge Join (cost=3439.09..442052.26 rows=25055030 width=16) (actual time=74.262..187.454 rows=20020 loops=1) Merge Cond: ((t.g = u.g) AND (t.f = u.f)) Buffers: shared hit=178 ... So the nest loop plan is hitting 770k buffer pages (5GB) while looping 20k times, rather than 178 buffers when each page is read once (to sort). Perhaps you could get a good plan by playing with these, but it's unclear why that's necessary. effective_cache_size | 1280 cpu_index_tuple_cost | 0.005 cpu_operator_cost| 0.0025 cpu_tuple_cost | 0.01 random_page_cost | 4 seq_page_cost| 1 Also, since PG96, FKs can improve join estimates: postgres=# CREATE UNIQUE INDEX ON u(f,g); postgres=# CREATE UNIQUE INDEX ON t(f,g); postgres=# ALTER TABLE t ADD CONSTRAINT fk_f FOREIGN KEY (f,g) REFERENCES u(f,g); postgres=# ALTER TABLE u ADD CONSTRAINT fk_u FOREIGN KEY (f,g) REFERENCES t(f,g); postgres=# explain analyze select * from t,u WHERE t.g=u.g AND t.f=u.f ;
