Fastest way to clone schema ~1000x

2024-02-25 Thread Emiel Mols
Hello,

To improve our unit and end-to-end testing performance, we are looking to
optimize initialization of around 500-1000 database *schemas* from a
schema.sql file.

Background: in postgres, you cannot change databases on
existing connections, and using schemas would allow us to minimize the
amount of active connections needed from our test runners, as we can reuse
connections by updating search_path. In our current database-per-test setup
we see that there is around a ~30% (!!) total CPU overhead in
native_queued_spin_lock_slowpath (as profiled with linux perf), presumably
because of the high connection count. We run ~200 tests in parallel to
saturate a 128 core machine.

In the schema-per-test setup, however, it becomes harder to cheaply create
the schema. Before we could `create database test001 with template
testbase` to set up the database for a test, and this was reasonably fast.
Re-inserting a schema with ~500 table/index definitions across 500 test
schema's is prohibitively expensive (around 1000ms per schema insertion
means we're wasting 500 cpu-seconds, and there seems to be quite some lock
contention too). Linux perf shows that in this setup we're reducing the
native_queued_spin_lock_slowpath overhead to around 10%, but total test
time is still slower due to all schema initialization being done. Also it
feels quite elegant functions and types can be reused between tests.

Does anyone have some insights or great ideas :)? Also pretty curious to
the fundamental reason why having high/changing connection counts to
postgres results in this much (spin)lock contention (or perhaps we're doing
something wrong in either our configuration or measurements?).

An alternative we haven't explored yet is to see if we can use pgbouncer or
other connection pooler to mitigate the 30% issue (set limits so there are
only ever X connections to postgres in total, and perhaps max Y per
test/database). This does add another piece of infrastructure/complexity,
so not really prefered.

Best,

Emiel


Re: Fastest way to clone schema ~1000x

2024-02-25 Thread Emiel Mols
Thanks, as indicated we're using that right now. The 30% spinlock overhead
unfortunately persists.

- Fsync was already disabled, too. Complete postgresql.conf used in testing:
listen_addresses = ''
max_connections = 2048
unix_socket_directories = '..'
shared_buffers = 128MB
log_line_prefix = ''
synchronous_commit = 'off'
wal_level = 'minimal'

- linux perf report comparing schema-per-test vs database-per-test:
https://ibb.co/CW5w2MW

- Emiel


On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule 
wrote:

> Hi
>
> po 26. 2. 2024 v 7:28 odesílatel Emiel Mols  napsal:
>
>> Hello,
>>
>> To improve our unit and end-to-end testing performance, we are looking to
>> optimize initialization of around 500-1000 database *schemas* from a
>> schema.sql file.
>>
>> Background: in postgres, you cannot change databases on
>> existing connections, and using schemas would allow us to minimize the
>> amount of active connections needed from our test runners, as we can reuse
>> connections by updating search_path. In our current database-per-test setup
>> we see that there is around a ~30% (!!) total CPU overhead in
>> native_queued_spin_lock_slowpath (as profiled with linux perf), presumably
>> because of the high connection count. We run ~200 tests in parallel to
>> saturate a 128 core machine.
>>
>> In the schema-per-test setup, however, it becomes harder to cheaply
>> create the schema. Before we could `create database test001 with template
>> testbase` to set up the database for a test, and this was reasonably fast.
>> Re-inserting a schema with ~500 table/index definitions across 500 test
>> schema's is prohibitively expensive (around 1000ms per schema insertion
>> means we're wasting 500 cpu-seconds, and there seems to be quite some lock
>> contention too). Linux perf shows that in this setup we're reducing the
>> native_queued_spin_lock_slowpath overhead to around 10%, but total test
>> time is still slower due to all schema initialization being done. Also it
>> feels quite elegant functions and types can be reused between tests.
>>
>> Does anyone have some insights or great ideas :)? Also pretty curious to
>> the fundamental reason why having high/changing connection counts to
>> postgres results in this much (spin)lock contention (or perhaps we're doing
>> something wrong in either our configuration or measurements?).
>>
>> An alternative we haven't explored yet is to see if we can use pgbouncer
>> or other connection pooler to mitigate the 30% issue (set limits so there
>> are only ever X connections to postgres in total, and perhaps max Y per
>> test/database). This does add another piece of infrastructure/complexity,
>> so not really prefered.
>>
>
> For testing
>
> a) use templates - CREATE DATABASE test TEMPLATE some;
>
> b) disable fsync (only for testing!!!)
>
> Regards
>
> Pavel
>
>
>> Best,
>>
>> Emiel
>>
>


Re: Fastest way to clone schema ~1000x

2024-02-26 Thread Emiel Mols
We've experimented with shared buffers to no effect.

The 2048 we actually need for our test setup. The way this works is that we
have a single preforked backend for all tests where each backend worker
maintains persistent connections *per test* (in database-per-test), so with
say 50 backend processes, they might each meantain ~10-50 connections to
postgres. That's the reason for looking into schema-per-test that would
require only 1 persistent connection per worker and possibly only changing
the search_path to handle a request (and hencefort cheap initialization of
schema.sql being a new impediment).

But based on your answer, we will do some more research into pooling these
requests over pgbouncer with pool_mode=transaction. That should multiplex
all these backend->bouncer connections over a lot less connections to
postgres itself.

Thanks!

On Mon, Feb 26, 2024 at 2:14 PM Pavel Stehule 
wrote:

>
>
> po 26. 2. 2024 v 8:08 odesílatel Emiel Mols  napsal:
>
>> Thanks, as indicated we're using that right now. The 30% spinlock
>> overhead unfortunately persists.
>>
>
> try to increase shared_buffer
>
> 128MB can be too low
>
> max_connection = 2048 - it unrealistic high
>
>
>> - Fsync was already disabled, too. Complete postgresql.conf used in
>> testing:
>> listen_addresses = ''
>> max_connections = 2048
>> unix_socket_directories = '..'
>> shared_buffers = 128MB
>> log_line_prefix = ''
>> synchronous_commit = 'off'
>> wal_level = 'minimal'
>>
>> - linux perf report comparing schema-per-test vs database-per-test:
>> https://ibb.co/CW5w2MW
>>
>> - Emiel
>>
>>
>> On Mon, Feb 26, 2024 at 1:36 PM Pavel Stehule 
>> wrote:
>>
>>> Hi
>>>
>>> po 26. 2. 2024 v 7:28 odesílatel Emiel Mols  napsal:
>>>
>>>> Hello,
>>>>
>>>> To improve our unit and end-to-end testing performance, we are looking
>>>> to optimize initialization of around 500-1000 database *schemas* from a
>>>> schema.sql file.
>>>>
>>>> Background: in postgres, you cannot change databases on
>>>> existing connections, and using schemas would allow us to minimize the
>>>> amount of active connections needed from our test runners, as we can reuse
>>>> connections by updating search_path. In our current database-per-test setup
>>>> we see that there is around a ~30% (!!) total CPU overhead in
>>>> native_queued_spin_lock_slowpath (as profiled with linux perf), presumably
>>>> because of the high connection count. We run ~200 tests in parallel to
>>>> saturate a 128 core machine.
>>>>
>>>> In the schema-per-test setup, however, it becomes harder to cheaply
>>>> create the schema. Before we could `create database test001 with template
>>>> testbase` to set up the database for a test, and this was reasonably fast.
>>>> Re-inserting a schema with ~500 table/index definitions across 500 test
>>>> schema's is prohibitively expensive (around 1000ms per schema insertion
>>>> means we're wasting 500 cpu-seconds, and there seems to be quite some lock
>>>> contention too). Linux perf shows that in this setup we're reducing the
>>>> native_queued_spin_lock_slowpath overhead to around 10%, but total test
>>>> time is still slower due to all schema initialization being done. Also it
>>>> feels quite elegant functions and types can be reused between tests.
>>>>
>>>> Does anyone have some insights or great ideas :)? Also pretty curious
>>>> to the fundamental reason why having high/changing connection counts to
>>>> postgres results in this much (spin)lock contention (or perhaps we're doing
>>>> something wrong in either our configuration or measurements?).
>>>>
>>>> An alternative we haven't explored yet is to see if we can use
>>>> pgbouncer or other connection pooler to mitigate the 30% issue (set limits
>>>> so there are only ever X connections to postgres in total, and perhaps max
>>>> Y per test/database). This does add another piece of
>>>> infrastructure/complexity, so not really prefered.
>>>>
>>>
>>> For testing
>>>
>>> a) use templates - CREATE DATABASE test TEMPLATE some;
>>>
>>> b) disable fsync (only for testing!!!)
>>>
>>> Regards
>>>
>>> Pavel
>>>
>>>
>>>> Best,
>>>>
>>>> Emiel
>>>>
>>>


Re: Fastest way to clone schema ~1000x

2024-02-26 Thread Emiel Mols
On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson  wrote:

> There is a measurable overhead in connections, regardless of if they are
> used
> or not.  If you are looking to squeeze out performance then doing more over
> already established connections, and reducing max_connections, is a good
> place
> to start.
>

Clear, but with database-per-test (and our backend setup), it would have
been *great* if we could have switched database on the same connection
(similar to "USE xxx" in mysql). That would limit the connections to the
amount of workers, not multiplied by tests.

Even with a pooler, we're still going to be maintaining 1000s of
connections from the backend workers to the pooler. I would expect this to
be rather efficient, but still unnecessary. Also, both pgbouncer/pgpool
don't seem to support switching database in-connection (they could have
implemented the aforementioned "USE" statement I think!). [Additionally
we're using PHP that doesn't seem to have a good shared memory pool
implementation -- pg_pconnect is pretty buggy].

I'll continue with some more testing. Thanks for now!


Re: Fastest way to clone schema ~1000x

2024-02-29 Thread Emiel Mols
An update on this for anyone else who has similar issues/considerations:
- As suggested, we are sticking to 1 database-per-test, and not a
schema-per-test.
- We moved our testing setup to maintain at most 1 connection per backend
worker. If a request for a different test comes in, we close and reconnect
to the different DB (in the future, we might test some affinity based
routing here). This does empirically seem to improve performance around
~5-10% over maintaining 1000+ connections. It probably helps that the
connection is local over a unix domain socket. We now set max_connections
in testing to 512, where we usually max out at ~200.
- We tried PgBouncer, which seems to work as well, but doesn't improve
performance much compared to the previous point. (Also it adds quite some
complexity, we still need to maintain the 1000+ connections to the pooler
as it doesn't support changing databases and the user names need to be
known/configured ahead of time which is annoying for us)
- We looked more into the `native_queued_spin_lock_slowpath` issue,
distilling a flamegraph to identify the actual caller(s): our latest
thinking now is that this is an artifact of measurement with perf/bpf (!!),
so the concerns with this can be ignored otherwise. Flame graph:
https://ibb.co/sW34mgq. Reason to think so is that
`bpf_trampoline_6442485415_0` is in its call path. (Please do reach out if
you think this is wrong)

Thanks again for the quick help.

- Emiel

On Mon, Feb 26, 2024 at 5:10 PM Adrian Klaver 
wrote:

> On 2/26/24 01:06, Emiel Mols wrote:
> > On Mon, Feb 26, 2024 at 3:50 PM Daniel Gustafsson  > <mailto:dan...@yesql.se>> wrote:
> >
> > There is a measurable overhead in connections, regardless of if they
> > are used
> > or not.  If you are looking to squeeze out performance then doing
> > more over
> > already established connections, and reducing max_connections, is a
> > good place
> > to start.
> >
> >
> > Clear, but with database-per-test (and our backend setup), it would have
> > been *great* if we could have switched database on the same connection
> > (similar to "USE xxx" in mysql). That would limit the connections to the
> > amount of workers, not multiplied by tests.
>
> That is because:
>
> https://dev.mysql.com/doc/refman/8.3/en/glossary.html#glos_schema
>
> "In MySQL, physically, a schema is synonymous with a database. You can
> substitute the keyword SCHEMA instead of DATABASE in MySQL SQL syntax,
> for example using CREATE SCHEMA instead of CREATE DATABASE. "
>
>
> >
> > Even with a pooler, we're still going to be maintaining 1000s of
> > connections from the backend workers to the pooler. I would expect this
> > to be rather efficient, but still unnecessary. Also, both
> > pgbouncer/pgpool don't seem to support switching database in-connection
> > (they could have implemented the aforementioned "USE" statement I
> > think!). [Additionally we're using PHP that doesn't seem to have a good
> > shared memory pool implementation -- pg_pconnect is pretty buggy].
> >
> > I'll continue with some more testing. Thanks for now!
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>