Writing 1100 rows per second
I'm looking to write about 1100 rows per second to tables up to 100 million rows. I'm trying to come up with a design that I can do all the writes to a database with no indexes. When having indexes the write performance slows down dramatically after the table gets bigger than 30 million rows. I was thinking of having a server dedicated for all the writes and have another server for reads that has indexes and use logical replication to update the read only server. Would that work? Or any recommendations how I can achieve good performance for a lot of writes? Thank you
Re: Writing 1100 rows per second
On Wed, 2020-02-05 at 12:03 -0500, Arya F wrote: > I'm looking to write about 1100 rows per second to tables up to 100 million > rows. I'm trying to > come up with a design that I can do all the writes to a database with no > indexes. When having > indexes the write performance slows down dramatically after the table gets > bigger than 30 million rows. > > I was thinking of having a server dedicated for all the writes and have > another server for reads > that has indexes and use logical replication to update the read only server. > > Would that work? Or any recommendations how I can achieve good performance > for a lot of writes? Logical replication wouldn't make a difference, because with many indexes, replay of the inserts would be slow as well, and replication would lag more and more. No matter what you do, there will be no magic way to have your tables indexed and have fast inserts at the same time. One idea I can come up with is a table that is partitioned by a column that appears in a selective search condition, but have no indexes on the table, so that you always get away with a sequential scan of a single partition. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Re: Writing 1100 rows per second
On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote: > I'm looking to write about 1100 rows per second to tables up to 100 million > rows. I'm trying to come up with a design that I can do all the writes to a > database with no indexes. When having indexes the write performance slows > down dramatically after the table gets bigger than 30 million rows. > > I was thinking of having a server dedicated for all the writes and have > another server for reads that has indexes and use logical replication to > update the read only server. Wouldn't the readonly server still have bad performance for all the wites being replicated to it ? > Would that work? Or any recommendations how I can achieve good performance > for a lot of writes? Can you use partitioning so the updates are mostly affecting only one table at once, and its indices are of reasonable size, such that they can fit easily in shared_buffers. brin indices may help for some, but likely not for all your indices. Justin
Re: Writing 1100 rows per second
If I run the database on a server that has enough ram to load all the indexes and tables into ram. And then it would update the index on the HDD every x seconds. Would that work to increase performance dramatically? On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby wrote: > On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote: > > I'm looking to write about 1100 rows per second to tables up to 100 > million > > rows. I'm trying to come up with a design that I can do all the writes > to a > > database with no indexes. When having indexes the write performance slows > > down dramatically after the table gets bigger than 30 million rows. > > > > I was thinking of having a server dedicated for all the writes and have > > another server for reads that has indexes and use logical replication to > > update the read only server. > > Wouldn't the readonly server still have bad performance for all the wites > being > replicated to it ? > > > Would that work? Or any recommendations how I can achieve good > performance > > for a lot of writes? > > Can you use partitioning so the updates are mostly affecting only one > table at > once, and its indices are of reasonable size, such that they can fit > easily in > shared_buffers. > > brin indices may help for some, but likely not for all your indices. > > Justin >
Re: Writing 1100 rows per second
Arya, We ran into the issue of decreasing insert performance for tables of hundreds of millions of rows and they are indeed due to index updates. We tested TimescaleDB (a pgsql plugin) with success in all use cases that we have. It does a "behind the scenes" single-level partitioning that is indeed very efficient. Not sure about the 1100 inserts/s as it is hardware dependent, but we got the flat response curve (inserts per second stayed stable with hundreds of millions of rows, regardless of indexes). My suggestion: have a look at https://blog.timescale.com/timescaledb-vs-6a696248104e/ , and do some PoCs. Regards, Haroldo Kerry On Wed, Feb 5, 2020 at 2:25 PM Arya F wrote: > If I run the database on a server that has enough ram to load all the > indexes and tables into ram. And then it would update the index on the HDD > every x seconds. Would that work to increase performance dramatically? > > On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby wrote: > >> On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote: >> > I'm looking to write about 1100 rows per second to tables up to 100 >> million >> > rows. I'm trying to come up with a design that I can do all the writes >> to a >> > database with no indexes. When having indexes the write performance >> slows >> > down dramatically after the table gets bigger than 30 million rows. >> > >> > I was thinking of having a server dedicated for all the writes and have >> > another server for reads that has indexes and use logical replication to >> > update the read only server. >> >> Wouldn't the readonly server still have bad performance for all the wites >> being >> replicated to it ? >> >> > Would that work? Or any recommendations how I can achieve good >> performance >> > for a lot of writes? >> >> Can you use partitioning so the updates are mostly affecting only one >> table at >> once, and its indices are of reasonable size, such that they can fit >> easily in >> shared_buffers. >> >> brin indices may help for some, but likely not for all your indices. >> >> Justin >> > -- Haroldo Kerry CTO/COO Rua do Rócio, 220, 7° andar, conjunto 72 São Paulo – SP / CEP 04552-000 [email protected] www.callix.com.br
Re: Slow performance with trivial self-joins
Thank you for your response. I have tested out the patch in the linked thread and it works very well on a bunch of complex queries that I have tested, improving both the planning time significantly and the execution time drastically. I have also read through the entire linked discussion thread as well as a few other large threads linked from it, and found the discussion very interesting. I don't believe that all such queries are "poorly-written". As was discussed in the other threads, the reason these types of self-joins can occur is when you use SQL views. You can create a library of reusable views that are small, easy-to-understand and readable. Then you build them up into bigger views, and finally query from them. But then you end up with lots of (hidden) self-joins. The alternative is to copy&paste the shared logic from the views into all of the queries. I understand the need to be conservative about which optimizations to apply in order to not waste time looking for opportunities that don't exist. One idea I had that I didn't see mentioned is the following heuristic: Only if a query references an SQL view (or multiple views), then try to apply the self_join_removal optimization. This should be enough, because as you say, no human would intentionally write such a query. Queries generated by ORMs were also discussed, so I believe it might also be beneficial to consider queries that contain inner SELECTs.
Re: Slow performance with trivial self-joins
> You can create a library of > reusable views that are small, easy-to-understand and readable. Then > you build them up into bigger views, and finally query from them. But > then you end up with lots of (hidden) self-joins. I will concur with this use case being pretty common, but also something I have actively avoided anywhere performance is important because of the lack of this optimization. Even still, I have 20+ views like that in my database.
Re: Slow performance with trivial self-joins
On Thu, 6 Feb 2020 at 11:12, Adam Brusselback wrote: > > > You can create a library of > > reusable views that are small, easy-to-understand and readable. Then > > you build them up into bigger views, and finally query from them. But > > then you end up with lots of (hidden) self-joins. > > I will concur with this use case being pretty common, but also something I > have actively avoided anywhere performance is important because of the > lack of this optimization. > > Even still, I have 20+ views like that in my database. I think the best direction to move in to push that forward would be to go and benchmark the proposed patch and see if the overhead of detecting the self joined relations is measurable with various queries with varying numbers of joins. It does not sound too like it would be a great deal of effort to look through the rangetable for duplicate Oids and only do further processing to attempt self-join removal if there are. However, if that effort happened to slow down all queries by say 5%, then perhaps it would be a bad idea. People's opinions don't really have much traction for arguments on this. Unbiased and reproducible benchmarks should be used as evidence to support discussion. Doing worst-case and average-case benchmarks initially will save you time, as someone will almost certainly ask if you don't do it. (I've not been following the thread for the patch) -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
