Need to know more about pg_test_fsync utility

2021-12-09 Thread PGSQL DBA
Hi Team,

I would like to understand the internals of pg_test_fsync utility as I know
pg_test_fsync is intended to give us a reasonable idea of what is the
fastest wal_sync_method for our system but as per the documentation it will
also provide the diagnostic information in the event of an I/O problem. So,
I have below questions:

1) How to interpret the output of pg_test_fsync?
2) What is the meaning of ops/sec & usecs/op?
3) How does this utility work internally?
4) What is the IO pattern of this utility? serial/sequence IO or Multiple
thread with Parallel IO?
5) Can we change the testing like FIO with multiple threads and parallel IO?
6) How a commit happened in the background  while executing this utility?
7) How can we use this tool to measure the I/O issue?
8) In which area or section in the output do we need to focus while
troubleshooting I/O issues?
9) What is the meaning of “Non-sync’ed 8kB writes?

I also would like to know is there any best Practice from PostgreSQL which
mentions what is the disk latency required for the WAL & DATA disk?

Thank You :)


Re: Need to know more about pg_test_fsync utility

2021-12-12 Thread PGSQL DBA
Hi Thomas,

Apologies for uploading the wrong screenshot in the attachment. I have
uploaded the correct output of pg_test_fsync for your reference .

Please find our environment details as below:
Infra: Azure IaaS
OS: Red Hat Enterprise Linux Server release 7.9
File System: XFS
Disk: Premium SSD
IOPS: 5000 per Disk
Throughput: 200 MB/Sec per Disk
Disk Cache: Read+Write
Workload: OLTP

Actual issue is we are getting slow execution of Commit & Prepared Commit
statements only.  Execution of these statements is taking 2 to 14 seconds.
That's why we are analysing the pg_test_fsync output & FIO for further
troubleshooting but unable to find any right directions.



On Mon, 13 Dec 2021 at 10:45, Thomas Munro  wrote:

> On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA  wrote:
> > As you mentioned in question-8, "I'd investigate whether data is being
> cached unexpectedly, perhaps indicating that committed transactions be lost
> in a system crash event." So, I would like to know that if we configure the
> disk for the WALs with read+write disk cache then will it create any
> performance issue and show the attached output?
>
> Which OS and file system are you running and what kind of storage
> system is it?  If you can turn write caching on and off in your
> storage layer without affecting the ops/sec that's a pretty good clue
> that it's respecting the cache control commands that the OS sends.
> The numbers you showed are reassuringly low.  Hmm, I wonder why you
> have such a low number for non-sync'd writes (the last line).
>
> I have a concrete example of using this tool to learn something about
> an unfamiliar-to-me operating system, from this morning: I saw a
> system that reported ~6k IOPS for open_datasync and only ~600 IOPS for
> fdatasync.  With some limited knowledge of file systems I expect the
> latter to involve sending a SYNCHRONIZE CACHE command to flush device
> caches, and the former to do that, or use a more efficient FUA flag to
> flush just particular writes and not the whole write cache.  I didn't
> expect it to go 10x faster, so something had to be wrong.  With some
> more investigation and reading I learned that the storage drivers I
> was using do indeed ignore FUA cache control flags, so that
> wal_sync_method is not crash safe; if you lose power and come back up,
> you might lose a bunch of committed transactions from the end of the
> WAL.  I turned off write caches in the storage device, and then
> open_datasync became a lot slower, matching my expectations.
>
> These are primitive tools, but can help to check some assumptions...
>
> Obligatory remark: PostgreSQL 9.5 is out of support, please see
> https://www.postgresql.org/support/versioning/.
>
> > I also would like to know is there any best Practice from PostgreSQL
> which mentions what is the disk latency required for the WAL & DATA disk?
>
> No particular latency is required by PostgreSQL, and I don't have a
> general answer to this.  Is it fast enough for you?  What sort of
> workload is it running, OLTP, reporting, ...?
>