Q on SQL Performance tuning
Hi Team, I've few Questions on SQL perf tuning. 1) Is there any SQL monitoring report that's available in Oracle. Highlight of the report is it tells the % of time spent on CPU & IO. And which step took how much % in overall execution. 2) Is there anyway to know the historical execution plan details of a particular SQL ? Per my understanding so far since there is no concept of shared pool unlike Oracle every execution demands a new hard parse. However wanted to check with experts to know if any extension available on this? Thanks! -Kaushik -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, notify the sender immediately by return email and delete the message and any attachments from your system.
Re: Q on SQL Performance tuning
Hi, There are many tools: - (core) extension pg_stat_statements will give you informations of SQL executions, - extension pgsentinel https://github.com/pgsentinel/pgsentinel gives the same results as Oracle ASH view - java front end PASH viewer https://github.com/dbacvetkov/PASH-Viewer gives a nice view of CPU IO per query - extension pg_stat_sql_plans (alpha) gives all of pg_stat_statements and much more (parsing time, planid, plan text, ...) Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-performance-f2050081.html
Re: Q on SQL Performance tuning
On Sun, Jan 27, 2019 at 08:43:15AM +, Bhupathi, Kaushik (CORP) wrote: > 2) Is there anyway to know the historical execution plan details of a > particular SQL ? Per my understanding so far since there is no concept of > shared pool unlike Oracle every execution demands a new hard parse. However > wanted to check with experts to know if any extension available on this? There's also autoexplain, althought I think that's typically configured to only output plans for queries which longer than a minimum duration. Justin
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
On Sun, Jan 27, 2019 at 01:09:16PM +0530, Saurabh Nanda wrote: > It seems that PGOPTIONS="-c synchronous_commit=off" has a significant > impact. However, I still can not understand why the TPS for the optimised > case is LOWER than the default for higher concurrency levels! Do you know which of the settings is causing lower TPS ? I suggest to check shared_buffers. If you haven't done it, disabling THP and KSM can resolve performance issues, esp. with large RAM like shared_buffers, at least with older kernels. https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com Justin
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
> Do you know which of the settings is causing lower TPS ? > I suggest to check shared_buffers. > I'm trying to find this, but it's taking a lot of time in re-running the benchmarks changing one config setting at a time. Thanks for the tip related to shared_buffers. > > If you haven't done it, disabling THP and KSM can resolve performance > issues, > esp. with large RAM like shared_buffers, at least with older kernels. > > https://www.postgresql.org/message-id/20170718180152.GE17566%40telsasoft.com Is this a well-known performance "hack"? Is there any reason why it is not mentioned at https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server ? Are the stability implications of fiddling with THP and KSM well-known? Also, wrt KSM, my understand was that when a process forks the process' memory is anyways "copy on write", right? What other kind of pages would end-up being de-duplicated by ksmd? (Caveat: This is the first time I'm hearing about KSM and my knowledge is based off a single reading of https://www.kernel.org/doc/html/latest/admin-guide/mm/ksm.html ) -- Saurabh.
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
> > > You could also try pg_test_fsync to get low-level information, to >> supplement the high level you get from pgbench. > > > Thanks for pointing me to this tool. never knew pg_test_fsync existed! > I've run `pg_test_fsync -s 60` two times and this is the output - > https://gist.github.com/saurabhnanda/b60e8cf69032b570c5b554eb50df64f8 I'm > not sure what to make of it? > I don't know what to make of that either. I'd expect fdatasync using two 8kB writes to be about the same throughput as using one 8kB write, but instead it is 4 times slower. Also, I'd expect open_datasync to get slower by a factor of 2, not a factor of 8, when going from one to two 8kB writes (that is not directly relevant, as you aren't using open_datasync, but is curious nonetheless). Is this reproducible with different run lengths? I wonder if your write cache (or something) gets "tired" during the first part of pg_test_fsync and thus degrades the subsequent parts of the test. I would say something in your IO stack is not optimal, maybe some component is "consumer grade" rather than "server grade". Maybe you can ask Hetzner about that. > The effects of max_wal_size are going to depend on how you have IO >> configured, for example does pg_wal shared the same devices and controllers >> as the base data? It is mostly about controlling disk usage and >> crash-recovery performance, neither of which is of primary importance to >> pgbench performance. > > > The WAL and the data-directory reside on the same SSD disk -- is this a > bad idea? > If you are trying to squeeze out every last bit of performance, then I think it is bad idea. Or at least, something to try the alternative and see. The flushing that occurs during checkpoints and the flushing that occurs for every commit can interfere with each other. > I was under the impression that smaller values for max_wal_size cause > pg-server to do "maintenance work" related to wal rotation, etc. more > frequently and would lead to lower pgbench performance. > If you choose ridiculously small values it would. But once the value is sufficient, increasing it further wouldn't do much. Given your low level of throughput, I would think the default is already sufficient. Thanks for including the storage info. Nothing about it stands out to me as either good or bad, but I'm not a hardware maven; hopefully one will be reading along and speak up. > PS: Cc-ing the list back again because I assume you didn't intend for your > reply to be private, right? > Yes, I had intended to include the list but hit the wrong button, sorry. Cheers, Jeff >
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
On Sun, Jan 27, 2019 at 2:39 AM Saurabh Nanda wrote: > >> PGOPTIONS="-c synchronous_commit=off" pgbench -T 3600 -P 10 >> >> >> I am currently running all my benchmarks with synchronous_commit=off and >> will get back with my findings. >> > > > It seems that PGOPTIONS="-c synchronous_commit=off" has a significant > impact. > It is usually not acceptable to run applications with synchronous_commit=off, so once you have identified that the bottleneck is in implementing synchronous_commit=on, you probably need to take a deep dive into your hardware to figure out why it isn't performing the way you need/want/expect it to. Tuning the server under synchronous_commit=off when you don't intend to run your production server with that setting is unlikely to be fruitful. > However, I still can not understand why the TPS for the optimised case is LOWER than the default for higher concurrency levels! In case you do intend to run with synchronous_commit=off, or if you are just curious: running with a very high number of active connections often reveals subtle bottlenecks and interactions, and is very dependent on your hardware. Unless you actually intend to run our server with synchronous_commit=off and with a large number of active connections, it is probably not worth investigating this. You can make a hobby of it, of course, but it is a time consuming hobby to have. If you do want to, I think you should start out with your optimized settings and revert them one at a time to find the one the caused the performance regression. I'm more interested in the low end, you should do much better than those reported numbers when clients=1 and synchronous_commit=off with the data on SSD. I think you said that pgbench is running on a different machine than the database, so perhaps it is just network overhead that is keeping this value down. What happens if you run them on the same machine? > ++-++ > > | client | Mostly defaults [1] | Optimised settings [2] | > > ++-++ > > | 1 | 80-86 | 169-180| > > ++-++ > Cheers, Jeff
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
> > It is usually not acceptable to run applications with > synchronous_commit=off, so once you have identified that the bottleneck is > in implementing synchronous_commit=on, you probably need to take a deep > dive into your hardware to figure out why it isn't performing the way you > need/want/expect it to. Tuning the server under synchronous_commit=off > when you don't intend to run your production server with that setting is > unlikely to be fruitful. > I do not intend to run the server with synchronous_commit=off, but based on my limited knowledge, I'm wondering if all these observations are somehow related and are caused by the same underlying bottleneck (or misconfiguration): 1) At higher concurrency levels, TPS for synchronous_commit=off is lower for optimised settings when compared to default settings 2) At ALL concurrency levels, TPS for synchronous_commit=on is lower for optimised settings (irrespective of shared_buffers value), compared to default settings 3) At higher concurrency levels, optimised + synchronous_commit=on + shared_buffers=2G has HIGHER TPS than optimised + synchronous_commit=off + shared_buffers=8G Here are the (completely counter-intuitive) numbers for these observations: ++-++ || synchronous_commit=on | synchronous_commit=off | ++-++ | client | Mostly defaults [1] | Optimised [2] | Optimised [2] | Optimised [2] | || | + shared_buffers=2G | + shared_buffers=8G | + shared_buffers=8G| ++-+-+-++ | 1 | 80-86 | 74-77 | 75-75 | 169-180| ++-+-+-++ | 6 | 350-376 | 301-304 | 295-300 | 1265-1397 | ++-+-+-++ | 12 | 603-619 | 476-488 | 485-493 | 1746-2352 | ++-+-+-++ | 24 | 947-1015| 678-739 | 723-770 | 1869-2518 | ++-+-+-++ | 48 | 1435-1512 | 950-1043| 1029-1086 | 1912-2818 | ++-+-+-++ | 96 | 1769-1811 | 3337-3459 | 1302-1346 | 1546-1753 | ++-+-+-++ | 192| 1857-1992 | 3613-3715 | 1269-1345 | 1332-1508 | ++-+-+-++ | 384| 1667-1793 | 3180-3300 | 1262-1364 | 1356-1450 | ++-+-+-++ > > In case you do intend to run with synchronous_commit=off, or if you are > just curious: running with a very high number of active connections often > reveals subtle bottlenecks and interactions, and is very dependent on your > hardware. Unless you actually intend to run our server with > synchronous_commit=off and with a large number of active connections, it is > probably not worth investigating this. > Please see the table above. The reason why I'm digging deeper into this is because of observation (2) above, i.e. I am unable to come up with any optimised setting that performs better than the default settings for the concurrency levels that I care about (100-150). > I'm more interested in the low end, you should do much better than those > reported numbers when clients=1 and synchronous_commit=off with the data on > SSD. I think you said that pgbench is running on a different machine than > the database, so perhaps it is just network overhead that is keeping this > value down. What happens if you run them on the same machine? > I'm currently running this, but the early numbers are surprising. For client=1, the numbers for optimised settings + shared_buffers=2G are: -- pgbench run over a 1Gbps network: 74-77 tps -- pgbench run on the same machine: 152-153 tps (is this absolute number good enough given my hardware?) Is 1 Gbps network the bottleneck? Does it explain the three observations given above? I'll wait for the current set of benchmarks to finish and re-run the benchmarks over the network and monitor network utilisation. [1] "Mostly default" settings are
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
Here's the previous table again -- trying to prevent the wrapping. +++-+ || synchronous_commit=on | sync_commit=off | +++-+ | client | Defaults [1] | Optimised [2] | Optimised [2] | Optimised [2] | || | (buffers=2G) | (buffers=8G) | (buffers=8G)| ++--+++-+ | 1 | 80-86| 74-77 | 75-75 | 169-180 | ++--+++-+ | 6 | 350-376 | 301-304| 295-300| 1265-1397 | ++--+++-+ | 12 | 603-619 | 476-488| 485-493| 1746-2352 | ++--+++-+ | 24 | 947-1015 | 678-739| 723-770| 1869-2518 | ++--+++-+ | 48 | 1435-1512| 950-1043 | 1029-1086 | 1912-2818 | ++--+++-+ | 96 | 1769-1811| 3337-3459 | 1302-1346 | 1546-1753 | ++--+++-+ | 192| 1857-1992| 3613-3715 | 1269-1345 | 1332-1508 | ++--+++-+ | 384| 1667-1793| 3180-3300 | 1262-1364 | 1356-1450 | ++--+++-+
Re: Benchmarking: How to identify bottleneck (limiting factor) and achieve "linear scalability"?
All this benchmarking has led me to a philosophical question, why does PG need shared_buffers in the first place? What's wrong with letting the OS do the caching/buffering? Isn't it optimised for this kind of stuff?
