Writing 1100 rows per second

2020-02-05 Thread Arya F
I'm looking to write about 1100 rows per second to tables up to 100 million
rows. I'm trying to come up with a design that I can do all the writes to a
database with no indexes. When having indexes the write performance slows
down dramatically after the table gets bigger than 30 million rows.


I was thinking of having a server dedicated for all the writes and have
another server for reads that has indexes and use logical replication to
update the read only server.


Would that work? Or any recommendations how I can achieve good performance
for a lot of writes?

Thank you


Re: Writing 1100 rows per second

2020-02-05 Thread Laurenz Albe
On Wed, 2020-02-05 at 12:03 -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million 
> rows. I'm trying to
> come up with a design that I can do all the writes to a database with no 
> indexes. When having
> indexes the write performance slows down dramatically after the table gets 
> bigger than 30 million rows.
> 
> I was thinking of having a server dedicated for all the writes and have 
> another server for reads
> that has indexes and use logical replication to update the read only server.
> 
> Would that work? Or any recommendations how I can achieve good performance 
> for a lot of writes?

Logical replication wouldn't make a difference, because with many indexes, 
replay of the
inserts would be slow as well, and replication would lag more and more.

No matter what you do, there will be no magic way to have your tables indexed 
and
have fast inserts at the same time.

One idea I can come up with is a table that is partitioned by a column that 
appears
in a selective search condition, but have no indexes on the table, so that you 
always get
away with a sequential scan of a single partition.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Writing 1100 rows per second

2020-02-05 Thread Justin Pryzby
On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
> I'm looking to write about 1100 rows per second to tables up to 100 million
> rows. I'm trying to come up with a design that I can do all the writes to a
> database with no indexes. When having indexes the write performance slows
> down dramatically after the table gets bigger than 30 million rows.
> 
> I was thinking of having a server dedicated for all the writes and have
> another server for reads that has indexes and use logical replication to
> update the read only server.

Wouldn't the readonly server still have bad performance for all the wites being
replicated to it ?

> Would that work? Or any recommendations how I can achieve good performance
> for a lot of writes?

Can you use partitioning so the updates are mostly affecting only one table at
once, and its indices are of reasonable size, such that they can fit easily in
shared_buffers.

brin indices may help for some, but likely not for all your indices.

Justin




Re: Writing 1100 rows per second

2020-02-05 Thread Arya F
If I run the database on a server that has enough ram to load all the
indexes and tables into ram. And then it would update the index on the HDD
every x seconds. Would that work to increase performance dramatically?

On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby  wrote:

> On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
> > I'm looking to write about 1100 rows per second to tables up to 100
> million
> > rows. I'm trying to come up with a design that I can do all the writes
> to a
> > database with no indexes. When having indexes the write performance slows
> > down dramatically after the table gets bigger than 30 million rows.
> >
> > I was thinking of having a server dedicated for all the writes and have
> > another server for reads that has indexes and use logical replication to
> > update the read only server.
>
> Wouldn't the readonly server still have bad performance for all the wites
> being
> replicated to it ?
>
> > Would that work? Or any recommendations how I can achieve good
> performance
> > for a lot of writes?
>
> Can you use partitioning so the updates are mostly affecting only one
> table at
> once, and its indices are of reasonable size, such that they can fit
> easily in
> shared_buffers.
>
> brin indices may help for some, but likely not for all your indices.
>
> Justin
>


Re: Writing 1100 rows per second

2020-02-05 Thread Haroldo Kerry
Arya,
We ran into the issue of decreasing insert performance for tables of
hundreds of millions of rows and they are indeed due to index updates.
We tested TimescaleDB (a pgsql plugin) with success in all use cases that
we have. It does a "behind the scenes" single-level partitioning that is
indeed very efficient.
Not sure about the 1100 inserts/s  as it is hardware dependent, but we got
the flat response curve (inserts per second stayed stable with hundreds of
millions of rows, regardless of indexes).
My suggestion: have a look at
https://blog.timescale.com/timescaledb-vs-6a696248104e/ , and do some PoCs.

Regards,
Haroldo Kerry

On Wed, Feb 5, 2020 at 2:25 PM Arya F  wrote:

> If I run the database on a server that has enough ram to load all the
> indexes and tables into ram. And then it would update the index on the HDD
> every x seconds. Would that work to increase performance dramatically?
>
> On Wed, Feb 5, 2020, 12:15 PM Justin Pryzby  wrote:
>
>> On Wed, Feb 05, 2020 at 12:03:52PM -0500, Arya F wrote:
>> > I'm looking to write about 1100 rows per second to tables up to 100
>> million
>> > rows. I'm trying to come up with a design that I can do all the writes
>> to a
>> > database with no indexes. When having indexes the write performance
>> slows
>> > down dramatically after the table gets bigger than 30 million rows.
>> >
>> > I was thinking of having a server dedicated for all the writes and have
>> > another server for reads that has indexes and use logical replication to
>> > update the read only server.
>>
>> Wouldn't the readonly server still have bad performance for all the wites
>> being
>> replicated to it ?
>>
>> > Would that work? Or any recommendations how I can achieve good
>> performance
>> > for a lot of writes?
>>
>> Can you use partitioning so the updates are mostly affecting only one
>> table at
>> once, and its indices are of reasonable size, such that they can fit
>> easily in
>> shared_buffers.
>>
>> brin indices may help for some, but likely not for all your indices.
>>
>> Justin
>>
>

-- 

Haroldo Kerry

CTO/COO

Rua do Rócio, 220, 7° andar, conjunto 72

São Paulo – SP / CEP 04552-000

[email protected]

www.callix.com.br


Re: Slow performance with trivial self-joins

2020-02-05 Thread Benny Kramek
Thank you for your response. I have tested out the patch in the linked
thread and it works very well on a bunch of complex queries that I
have tested, improving both the planning time significantly and the
execution time drastically.

I have also read through the entire linked discussion thread as well
as a few other large threads linked from it, and found the discussion
very interesting.

I don't believe that all such queries are "poorly-written". As was
discussed in the other threads, the reason these types of self-joins
can occur is when you use SQL views. You can create a library of
reusable views that are small, easy-to-understand and readable. Then
you build them up into bigger views, and finally query from them. But
then you end up with lots of (hidden) self-joins. The alternative is
to copy&paste the shared logic from the views into all of the queries.

I understand the need to be conservative about which optimizations to
apply in order to not waste time looking for opportunities that don't
exist. One idea I had that I didn't see mentioned is the following
heuristic: Only if a query references an SQL view (or multiple views),
then try to apply the self_join_removal optimization. This should be
enough, because as you say, no human would intentionally write such a
query. Queries generated by ORMs were also discussed, so I believe it
might also be beneficial to consider queries that contain inner
SELECTs.




Re: Slow performance with trivial self-joins

2020-02-05 Thread Adam Brusselback
> You can create a library of
> reusable views that are small, easy-to-understand and readable. Then
> you build them up into bigger views, and finally query from them. But
> then you end up with lots of (hidden) self-joins.

I will concur with this use case being pretty common, but also something I
have actively avoided anywhere performance is important because of the
lack of this optimization.

Even still, I have 20+ views like that in my database.


Re: Slow performance with trivial self-joins

2020-02-05 Thread David Rowley
On Thu, 6 Feb 2020 at 11:12, Adam Brusselback  wrote:
>
> > You can create a library of
> > reusable views that are small, easy-to-understand and readable. Then
> > you build them up into bigger views, and finally query from them. But
> > then you end up with lots of (hidden) self-joins.
>
> I will concur with this use case being pretty common, but also something I
> have actively avoided anywhere performance is important because of the
> lack of this optimization.
>
> Even still, I have 20+ views like that in my database.

I think the best direction to move in to push that forward would be to
go and benchmark the proposed patch and see if the overhead of
detecting the self joined relations is measurable with various queries
with varying numbers of joins.

It does not sound too like it would be a great deal of effort to look
through the rangetable for duplicate Oids and only do further
processing to attempt self-join removal if there are. However, if that
effort happened to slow down all queries by say 5%, then perhaps it
would be a bad idea.  People's opinions don't really have much
traction for arguments on this. Unbiased and reproducible benchmarks
should be used as evidence to support discussion. Doing worst-case and
average-case benchmarks initially will save you time, as someone will
almost certainly ask if you don't do it.

(I've not been following the thread for the patch)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services