Q on SQL Performance tuning

2019-01-27 Thread Bhupathi, Kaushik (CORP)
Hi Team, I've few Questions on SQL perf tuning.


1)  Is there any SQL monitoring report that's available in Oracle. 
Highlight of the report is it tells the  % of time spent on CPU & IO. And which 
step took how much % in overall execution.

2)  Is there anyway to know the historical execution plan details of a 
particular SQL ? Per my understanding so far since there is no concept of 
shared pool unlike Oracle every execution demands a new hard parse. However 
wanted to check with experts to know if any extension available on this?


Thanks!
-Kaushik

--
This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.


Re: Q on SQL Performance tuning

2019-01-27 Thread legrand legrand
Hi,

There are many tools:
- (core) extension pg_stat_statements will give you informations of SQL
executions,
- extension pgsentinel https://github.com/pgsentinel/pgsentinel
  gives the same results as Oracle ASH view
- java front end PASH viewer https://github.com/dbacvetkov/PASH-Viewer 
  gives a nice view of CPU IO per query
- extension pg_stat_sql_plans (alpha) gives all of pg_stat_statements and
much more
  (parsing time, planid, plan text, ...)

Regards
PAscal



--
Sent from: 
http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html



Re: Q on SQL Performance tuning

2019-01-27 Thread Justin Pryzby
On Sun, Jan 27, 2019 at 08:43:15AM +, Bhupathi, Kaushik (CORP) wrote:
> 2)  Is there anyway to know the historical execution plan details of a 
> particular SQL ? Per my understanding so far since there is no concept of 
> shared pool unlike Oracle every execution demands a new hard parse. However 
> wanted to check with experts to know if any extension available on this?

There's also autoexplain, althought I think that's typically configured to only
output plans for queries which longer than a minimum duration.

Justin



Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Justin Pryzby
On Sun, Jan 27, 2019 at 01:09:16PM +0530, Saurabh Nanda wrote:
> It seems that PGOPTIONS="-c synchronous_commit=off" has a significant
> impact. However, I still can not understand why the TPS for the optimised
> case is LOWER than the default for higher concurrency levels!

Do you know which of the settings is causing lower TPS ?

I suggest to check shared_buffers.

If you haven't done it, disabling THP and KSM can resolve performance issues,
esp. with large RAM like shared_buffers, at least with older kernels.
https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com

Justin



Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Saurabh Nanda
> Do you know which of the settings is causing lower TPS ?


> I suggest to check shared_buffers.
>

I'm trying to find this, but it's taking a lot of time in re-running the
benchmarks changing one config setting at a time. Thanks for the tip
related to shared_buffers.


>
> If you haven't done it, disabling THP and KSM can resolve performance
> issues,
> esp. with large RAM like shared_buffers, at least with older kernels.
>
> https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com


Is this a well-known performance "hack"? Is there any reason why it is not
mentioned at https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
? Are the stability implications of fiddling with THP and KSM well-known?
Also, wrt KSM, my understand was that when a process forks the process'
memory is anyways "copy on write", right? What other kind of pages would
end-up being de-duplicated by ksmd? (Caveat: This is the first time I'm
hearing about KSM and my knowledge is based off a single reading of
https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html )

-- Saurabh.


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Jeff Janes
>
>
> You could also try pg_test_fsync to get low-level information, to
>> supplement the high level you get from pgbench.
>
>
> Thanks for pointing me to this tool. never knew pg_test_fsync existed!
> I've run `pg_test_fsync -s 60` two times and this is the output -
> https://gist.github.com/saurabhnanda/b60e8cf69032b570c5b554eb50df64f8 I'm
> not sure what to make of it?
>

I don't know what to make of that either.  I'd expect fdatasync using two
8kB writes to be about the same throughput as using one 8kB write, but
instead it is 4 times slower.  Also, I'd expect open_datasync to get slower
by a factor of 2, not a factor of 8, when going from one to two 8kB writes
(that is not directly relevant, as you aren't using open_datasync, but is
curious nonetheless).  Is this reproducible with different run lengths?  I
wonder if your write cache (or something) gets "tired" during the first
part of pg_test_fsync and thus degrades the subsequent parts of the test.
I would say something in your IO stack is not optimal, maybe some component
is "consumer grade" rather than "server grade".  Maybe you can ask Hetzner
about that.


> The effects of max_wal_size are going to depend on how you have IO
>> configured, for example does pg_wal shared the same devices and controllers
>> as the base data?  It is mostly about controlling disk usage and
>> crash-recovery performance, neither of which is of primary importance to
>> pgbench performance.
>
>
>  The WAL and the data-directory reside on the same SSD disk -- is this a
> bad idea?
>

If you are trying to squeeze out every last bit of performance, then I
think it is bad idea.  Or at least, something to try the alternative and
see.  The flushing that occurs during checkpoints and the flushing that
occurs for every commit can interfere with each other.


> I was under the impression that smaller values for max_wal_size cause
> pg-server to do "maintenance work" related to wal rotation, etc. more
> frequently and would lead to lower pgbench performance.
>

If you choose ridiculously small values it would.  But once the value is
sufficient, increasing it further wouldn't do much.  Given your low level
of throughput, I would think the default is already sufficient.

Thanks for including the storage info.  Nothing about it stands out to me
as either good or bad, but I'm not a hardware maven; hopefully one will be
reading along and speak up.


> PS: Cc-ing the list back again because I assume you didn't intend for your
> reply to be private, right?
>

Yes, I had intended to include the list but hit the wrong button, sorry.

Cheers,

Jeff

>


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Jeff Janes
On Sun, Jan 27, 2019 at 2:39 AM Saurabh Nanda 
wrote:

>
>> PGOPTIONS="-c synchronous_commit=off" pgbench -T 3600 -P 10 
>>
>>
>> I am currently running all my benchmarks with synchronous_commit=off and
>> will get back with my findings.
>>
>
>
> It seems that PGOPTIONS="-c synchronous_commit=off" has a significant
> impact.
>

It is usually not acceptable to run applications with
synchronous_commit=off, so once you have identified that the bottleneck is
in implementing synchronous_commit=on, you probably need to take a deep
dive into your hardware to figure out why it isn't performing the way you
need/want/expect it to.  Tuning the server under synchronous_commit=off
when you don't intend to run your production server with that setting is
unlikely to be fruitful.


> However, I still can not understand why the TPS for the optimised case is
LOWER than the default for higher concurrency levels!

In case you do intend to run with synchronous_commit=off, or if you are
just curious:  running with a very high number of active connections often
reveals subtle bottlenecks and interactions, and is very dependent on your
hardware.  Unless you actually intend to run our server with
synchronous_commit=off and with a large number of active connections, it is
probably not worth investigating this.  You can make a hobby of it, of
course, but it is a time consuming hobby to have.  If you do want to, I
think you should start out with your optimized settings and revert them one
at a time to find the one the caused the performance regression.

I'm more interested in the low end, you should do much better than those
reported numbers when clients=1 and synchronous_commit=off with the data on
SSD.  I think you said that pgbench is running on a different machine than
the database, so perhaps it is just network overhead that is keeping this
value down.  What happens if you run them on the same machine?

> ++-++
> > | client | Mostly defaults [1] | Optimised settings [2] |
> > ++-++
> > | 1  | 80-86   | 169-180|
> > ++-++
>

Cheers,

Jeff


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Saurabh Nanda
>
> It is usually not acceptable to run applications with
> synchronous_commit=off, so once you have identified that the bottleneck is
> in implementing synchronous_commit=on, you probably need to take a deep
> dive into your hardware to figure out why it isn't performing the way you
> need/want/expect it to.  Tuning the server under synchronous_commit=off
> when you don't intend to run your production server with that setting is
> unlikely to be fruitful.
>

I do not intend to run the server with synchronous_commit=off, but based on
my limited knowledge, I'm wondering if all these observations are somehow
related and are caused by the same underlying bottleneck (or
misconfiguration):

1) At higher concurrency levels, TPS for synchronous_commit=off is lower
for optimised settings when compared to default settings
2) At ALL concurrency levels, TPS for synchronous_commit=on is lower for
optimised settings (irrespective of shared_buffers value), compared to
default settings
3) At higher concurrency levels, optimised + synchronous_commit=on +
shared_buffers=2G has HIGHER TPS than optimised + synchronous_commit=off +
shared_buffers=8G

Here are the (completely counter-intuitive) numbers for these observations:

++-++
||  synchronous_commit=on
| synchronous_commit=off |
++-++
| client | Mostly defaults [1] | Optimised [2]   | Optimised [2]
| Optimised [2]  |
|| | + shared_buffers=2G | +
shared_buffers=8G | + shared_buffers=8G|
++-+-+-++
| 1  | 80-86   | 74-77   | 75-75
| 169-180|
++-+-+-++
| 6  | 350-376 | 301-304 | 295-300
| 1265-1397  |
++-+-+-++
| 12 | 603-619 | 476-488 | 485-493
| 1746-2352  |
++-+-+-++
| 24 | 947-1015| 678-739 | 723-770
| 1869-2518  |
++-+-+-++
| 48 | 1435-1512   | 950-1043| 1029-1086
| 1912-2818  |
++-+-+-++
| 96 | 1769-1811   | 3337-3459   | 1302-1346
| 1546-1753  |
++-+-+-++
| 192| 1857-1992   | 3613-3715   | 1269-1345
| 1332-1508  |
++-+-+-++
| 384| 1667-1793   | 3180-3300   | 1262-1364
| 1356-1450  |
++-+-+-++



>
> In case you do intend to run with synchronous_commit=off, or if you are
> just curious:  running with a very high number of active connections often
> reveals subtle bottlenecks and interactions, and is very dependent on your
> hardware.  Unless you actually intend to run our server with
> synchronous_commit=off and with a large number of active connections, it is
> probably not worth investigating this.
>

Please see the table above. The reason why I'm digging deeper into this is
because of observation (2) above, i.e. I am unable to come up with any
optimised setting that performs better than the default settings for the
concurrency levels that I care about (100-150).


> I'm more interested in the low end, you should do much better than those
> reported numbers when clients=1 and synchronous_commit=off with the data on
> SSD.  I think you said that pgbench is running on a different machine than
> the database, so perhaps it is just network overhead that is keeping this
> value down.  What happens if you run them on the same machine?
>

I'm currently running this, but the early numbers are surprising. For
client=1, the numbers for optimised settings + shared_buffers=2G are:

-- pgbench run over a 1Gbps network: 74-77 tps
-- pgbench run on the same machine: 152-153 tps (is this absolute number
good enough given my hardware?)

Is 1 Gbps network the bottleneck? Does it explain the three observations
given above? I'll wait for the current set of benchmarks to finish and
re-run the benchmarks over the network and monitor network utilisation.

[1] "Mostly default" settings are 

Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Saurabh Nanda
Here's the previous table again -- trying to prevent the wrapping.

+++-+
||  synchronous_commit=on | sync_commit=off |
+++-+
| client | Defaults [1] | Optimised [2]  | Optimised [2]  | Optimised [2]   |
||  | (buffers=2G)   | (buffers=8G)   | (buffers=8G)|
++--+++-+
| 1  | 80-86| 74-77  | 75-75  | 169-180 |
++--+++-+
| 6  | 350-376  | 301-304| 295-300| 1265-1397   |
++--+++-+
| 12 | 603-619  | 476-488| 485-493| 1746-2352   |
++--+++-+
| 24 | 947-1015 | 678-739| 723-770| 1869-2518   |
++--+++-+
| 48 | 1435-1512| 950-1043   | 1029-1086  | 1912-2818   |
++--+++-+
| 96 | 1769-1811| 3337-3459  | 1302-1346  | 1546-1753   |
++--+++-+
| 192| 1857-1992| 3613-3715  | 1269-1345  | 1332-1508   |
++--+++-+
| 384| 1667-1793| 3180-3300  | 1262-1364  | 1356-1450   |
++--+++-+


Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?

2019-01-27 Thread Saurabh Nanda
All this benchmarking has led me to a philosophical question, why does PG
need shared_buffers in the first place? What's wrong with letting the OS do
the caching/buffering?  Isn't it optimised for this kind of stuff?