Re: Postgresql TPS Bottleneck

2022-04-20 Thread wakandavision
Hi, thanks for your answer.

We have a Grafana instance monitoring all those metrics, no one I asked so far 
could identify an obvious bottleneck.
However, I have done further experiments to see if we are missing something.

While running the benchmark with our application I've run tools on the
DB node to smoke up the resources. These were cpuburn, iperf and fio.
While cpuburn did result in a small drop of Postgres TPS it was nothing
which was not expected. However, iperf and fio did not have any impact
at all (except iperf when more than our 10Gbps are sent - clearly). The
disks were utilized 100% but Postgres stayed at about 65k TPS.

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

I think what's also interesting is that our DB server has the TPS peak
when using about 80 clients (more results in the TPS going down again),
while when I search the internet most benchmarks peak at about 400-600
clients.

Does anyone have an idea what might be the problem?
Maybe I am missing a kernel/Postgres configuration parameter?

From: Tomas Vondra 
Sent: Thursday, March 31, 2022 9:16 PM
To: [email protected] ; 
[email protected] 
Subject: Re: Postgresql TPS Bottleneck



On 3/31/22 13:50, [email protected] wrote:
> Hi everyone,
>
> I am a bachelor's student and writing my thesis about the scaling and
> performance of an application. The application is using postgresql as a
> database but we can't scale any further currently as it seems postgres
> is hitting the limit.
>
> With the application, as well as with pgbench, we don't get more than
> (max) 70k TPS on postgres. But the servers' resources are not utilized
> completely (more below).
>
> I've tried many different configurations but none of them had any major
> performance impact (unless fsync and synchronous_commit = off).
>
> This is the (custom) configuration I am using:
>
> shared_buffers=65551953kB
> effective_cache_size=147491895kB
> huge_pages=on
> min_wal_size=20GB
> max_wal_size=200GB
> wal_buffers=1GB
> max_wal_senders=0
> archive_mode=off
> wal_level=minimal
> work_mem=2GB
> maintenance_work_mem=4GB
> checkpoint_completion_target=0.9
> checkpoint_timeout = 30min
> random_page_cost=1.1
> bgwriter_flush_after = 2MB
> effective_io_concurrency = 200
> # Disabled just for performance experiments
> fsync = off
> synchronous_commit = off
> full_page_writes = on
> max_worker_processes=64
> max_parallel_workers=64
> max_parallel_workers_per_gather=10
> max_parallel_maintenance_workers=12
>
> The system is as follows:
>
> * 64 Cores (Intel Xeon Gold 6130 (Skylake, 2.10GHz, 2 CPUs/node, 16
> cores/CPU))
> * 192 GiB RAM (12 * 16GiB DIMM DDR4 Synchronous Registered (Buffered)
> 2666 MHz (0.4 ns))
> * 2 * SSD SATA Samsung MZ7KM240HMHQ0D3 (one is used for the WAL and the
> other for the data)
> * 10 Gbps network link
> * OS: Debian 11
> * Postgres 13 from apt
>
> (I've also written a stackoverflow post about it -
> https://stackoverflow.com/questions/71631348/postgresql-bottleneck-neither-cpu-network-nor-i-o
> 
> )
> 
>
> performance - Postgresql bottleneck neither CPU, network nor I/O - Stack
> Overflow
> 
> We are testing our application for performance, which is using
> Postgresql 13 as a database. It is very insert and update heavy and we
> cannot get more than 65k TPS on the database. But none of the m...
> stackoverflow.com
>
>
>
> Below is just an example of the pgbench I ran:
>
> pgbench -i -s 50 -U postgres -h  -d 
> pgbench -c 64 -j 32 -t 10 -h  -U postgres 
>

I'd bet you need to use "pgbench -N" because the regular transaction
updates the "branch" table, and you only have 50 branches. Which
probably means a lot of conflicts and locking. The other thing you might
try is "-M prepared" which saves time on query planning.

FWIW I really doubt "fsync=off" will give you any meaningful results.

Maybe try assessing the hardware capability first, using tools like fio
to measure IOPS with different workloads.

Then try pgbench with a single client, and finally increase the number
of clients and see how it behaves and compare it to what you expect.

In any case, every system has a bottleneck. You're clearly hitting one,
otherwise the numbers would go faster. Usually, it's either CPU bound,
in which case "perf top" might tell us more, or it's IO bound, in which
case try e.g. "iostat -x -k 1" or something.

regards

--
Tomas

Re: significant jump in sql statement timing for on server vs a remote connection

2022-04-20 Thread Sbob


On 4/19/22 22:17, Jeff Janes wrote:

On Tue, Apr 19, 2022 at 5:00 PM Sbob  wrote:


However if we move the file to another server in the same network and
run with a psql -h then it runs for more than 10min.


What is the ping time?  Packet loss? You can't take for granted that 
the network is good and fast just because they are on the same LAN.


Cheers,

Jeff



Here is the ping stats:

--- db-primary ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3000ms
rtt min/avg/max/mdev = 0.304/0.348/0.400/0.039 ms


This seems pretty good yes?Anything else I could look at?



Re: Postgresql TPS Bottleneck

2022-04-20 Thread Jeff Janes
On Wed, Apr 20, 2022 at 5:13 AM  wrote:

>
> The next thing I did was starting two independent Postgres instances on
> the same server and run independent client applications against each of
> them. This resulted in our application getting almost double of the TPS
> compared to running a single instance (from 13k to 23k) - Each Postgres
> instance had about 45k TPS which did not increase (?).
>

How could that be?  Isn't there a one to one correspondence between app
progress and PostgreSQL transactions?  How could one almost double while
the other did not increase?  Anyway, 2x45 does seem like an increase
(smallish) over 65.

Your bottleneck for pgbench may be IPC/context switches.  I noticed that -S
did about 7 times more than the default, and it only makes one round trip
to the database while the default makes 7.

You could package up the different queries made by the default transaction
into one function call, in order to do the same thing but with fewer round
trips to the database. This would be an easy way to see if my theory is
true.  If it is, I don't know what that would mean for your app though, as
we know nothing about its structure.

I have a patch handy (attached) which implements this feature as the
builtin transaction "-b tpcb-func".  If you don't want to recompile
pgbench, you could dissect the patch to reimplement the same thing as a -f
style transaction instead.

Note that packaging it up this way does violate the spirit of the
benchmark, as clearly someone is supposed to look at the results of the
first select before deciding to proceed with the rest of the transaction.
But you don't seem very interested in the spirit of the tpc-b benchmark,
just in using it as a tool to track down a bottleneck.

Cheers,

Jeff


pgbench_function_v13.patch
Description: Binary data


Re: significant jump in sql statement timing for on server vs a remote connection

2022-04-20 Thread Ranier Vilela
Em qua., 20 de abr. de 2022 às 12:16, Sbob 
escreveu:

>
> On 4/19/22 22:17, Jeff Janes wrote:
>
> On Tue, Apr 19, 2022 at 5:00 PM Sbob  wrote:
>
>>
>> However if we move the file to another server in the same network and
>> run with a psql -h then it runs for more than 10min.
>
>
> What is the ping time?  Packet loss? You can't take for granted that the
> network is good and fast just because they are on the same LAN.
>
> Cheers,
>
> Jeff
>
>
> Here is the ping stats:
>
> --- db-primary ping statistics ---
> 4 packets transmitted, 4 received, 0% packet loss, time 3000ms
>
3000 ms?
Are sure that haven't packet loss?

regards,
Ranier Vilela


Re: Postgresql TPS Bottleneck

2022-04-20 Thread wakandavision
Clearly, I have only supplied half of the information there. I'm really sorry 
about that. The TPS measurement of the application does in no way correspond to 
the TPS of Postgres.
They are measured completely different but it's the measure we actually are 
interested in - as we want to assess the scalability of the application.

What I wanted to show is that the server we are hosting Postgres on is not 
bottlenecked (in an obvious way), as running two instances in parallel on the 
same server gives us almost double
the performance in our application and double the resource usage on the DB 
server. But what actually is strange(?), is that the TPS of Postgres does not 
change much, i.e. it's just 'distributed' to the two instances.

It would seem like our application could not handle more throughput, but I did 
the same with three instances, where we stayed again with 'only' double the 
performance and the TPS of Postgres distributed to three instances
(each client application running on an independent node).

I'm really getting frustrated here as I (and no one I asked yet) has an 
explanation for this behavior.

From: Jeff Janes 
Sent: Wednesday, April 20, 2022 5:49 PM
To: [email protected] 
Cc: Tomas Vondra ; 
[email protected] 
Subject: Re: Postgresql TPS Bottleneck

On Wed, Apr 20, 2022 at 5:13 AM 
mailto:[email protected]>> wrote:

The next thing I did was starting two independent Postgres instances on
the same server and run independent client applications against each of
them. This resulted in our application getting almost double of the TPS
compared to running a single instance (from 13k to 23k) - Each Postgres
instance had about 45k TPS which did not increase (?).

How could that be?  Isn't there a one to one correspondence between app 
progress and PostgreSQL transactions?  How could one almost double while the 
other did not increase?  Anyway, 2x45 does seem like an increase (smallish) 
over 65.

Your bottleneck for pgbench may be IPC/context switches.  I noticed that -S did 
about 7 times more than the default, and it only makes one round trip to the 
database while the default makes 7.

You could package up the different queries made by the default transaction into 
one function call, in order to do the same thing but with fewer round trips to 
the database. This would be an easy way to see if my theory is true.  If it is, 
I don't know what that would mean for your app though, as we know nothing about 
its structure.

I have a patch handy (attached) which implements this feature as the builtin 
transaction "-b tpcb-func".  If you don't want to recompile pgbench, you could 
dissect the patch to reimplement the same thing as a -f style transaction 
instead.

Note that packaging it up this way does violate the spirit of the benchmark, as 
clearly someone is supposed to look at the results of the first select before 
deciding to proceed with the rest of the transaction.  But you don't seem very 
interested in the spirit of the tpc-b benchmark, just in using it as a tool to 
track down a bottleneck.

Cheers,

Jeff


Re: Postgresql TPS Bottleneck

2022-04-20 Thread Benedict Holland
Ypu wouldn't get an increasing running two instances on the same server.
Distributed database severs is a complex application and tuning it will
depend on storage and CPU capacity. It could be as simple as a bus. Are you
running this locally or on the cloud? Are you running this on a distributed
file system or across a network? There are a dozen different reasons why a
database would not be using 100% of capacity from indexing to disk or bus
bound or network bound.

Thanks,
Ben

On Wed, Apr 20, 2022, 1:27 PM  wrote:

> Clearly, I have only supplied half of the information there. I'm really
> sorry about that. The TPS measurement of the application does in no way
> correspond to the TPS of Postgres.
> They are measured completely different but it's the measure we actually
> are interested in - as we want to assess the scalability of the
> application.
>
> What I wanted to show is that the server we are hosting Postgres on is not
> bottlenecked (in an obvious way), as running two instances in parallel on
> the same server gives us almost double
> the performance in our application and double the resource usage on the DB
> server. But what actually is strange(?), is that the TPS of Postgres does
> not change much, i.e. it's just 'distributed' to the two instances.
>
> It would seem like our application could not handle more throughput, but I
> did the same with three instances, where we stayed again with 'only' double
> the performance and the TPS of Postgres distributed to three instances
> (each client application running on an independent node).
>
> I'm really getting frustrated here as I (and no one I asked yet) has an
> explanation for this behavior.
> --
> *From:* Jeff Janes 
> *Sent:* Wednesday, April 20, 2022 5:49 PM
> *To:* [email protected] 
> *Cc:* Tomas Vondra ;
> [email protected] <
> [email protected]>
> *Subject:* Re: Postgresql TPS Bottleneck
>
> On Wed, Apr 20, 2022 at 5:13 AM  wrote:
>
>
> The next thing I did was starting two independent Postgres instances on
> the same server and run independent client applications against each of
> them. This resulted in our application getting almost double of the TPS
> compared to running a single instance (from 13k to 23k) - Each Postgres
> instance had about 45k TPS which did not increase (?).
>
>
> How could that be?  Isn't there a one to one correspondence between app
> progress and PostgreSQL transactions?  How could one almost double while
> the other did not increase?  Anyway, 2x45 does seem like an increase
> (smallish) over 65.
>
> Your bottleneck for pgbench may be IPC/context switches.  I noticed that
> -S did about 7 times more than the default, and it only makes one round
> trip to the database while the default makes 7.
>
> You could package up the different queries made by the default transaction
> into one function call, in order to do the same thing but with fewer round
> trips to the database. This would be an easy way to see if my theory is
> true.  If it is, I don't know what that would mean for your app though, as
> we know nothing about its structure.
>
> I have a patch handy (attached) which implements this feature as the
> builtin transaction "-b tpcb-func".  If you don't want to recompile
> pgbench, you could dissect the patch to reimplement the same thing as a -f
> style transaction instead.
>
> Note that packaging it up this way does violate the spirit of the
> benchmark, as clearly someone is supposed to look at the results of the
> first select before deciding to proceed with the rest of the transaction.
> But you don't seem very interested in the spirit of the tpc-b benchmark,
> just in using it as a tool to track down a bottleneck.
>
> Cheers,
>
> Jeff
>