Re: Advice on using materialized views
On Thu, 09 Dec 2021 16:06:27 + "Phil Endecott" wrote: > Thanks both for your replies. > > Wicher wrote: > > On Mon, 06 Dec 2021 18:48:47 + > > "Phil Endecott" wrote: > >> and > >> I need to modify the definition of a view that filters the "new" > >> values from the raw table each time the materialised view is > >> refreshed. > > > > You won't necessarily need to rewrite the "recent data" view definitions, I > > think. What is > > deemed "recent" depends on what's in the materialized views (it'd be > > anything newer than > > whatever is in there). The good news is that you can simply query for that > > :-) > > So trivially, in your "the data that is more recent than the stuff from the > > materialized > > views" non-materialized view you'd use a definition like > > SELECT WHERE sometimestamp > (select max(sometimestamp) from > > the_materialized_view) > > or something along those lines. > > > I guess I was hoping that someone would suggest a more "magic" > way to do this sort of thing. Actually I'm a bit surprised that > materialised views don't event have a way to either > > - Refresh a materialised view whenever a source table is modified; > > - Refresh a materialised view whenever it is read, if a source table > has changed since it was last refreshed. > > Beyond that, I could imagine smart updates where e.g. if you > modify source table rows with primary key K, then you only need > to refresh materialised view rows derived from K. > > I think this could all be done on top of triggers. I wonder, do any > other databases do things like this automagically? > Not too long ago I asking the list something similar but came up short: https://www.postgresql.org/message-id/20211129102315.058595fe@tipi Here's my own take on that problem area, tangent to your question. This project aims to do (or make it easier to do) the things you mention: https://git.sr.ht/~nullenenenen/DBSamizdat It supports your first use case out of the box, and may make your second use case easier to accommodate, give it a go :-) There's a sweet spot for materialized views. But at some point (volume/computational load/freshness requirements) it becomes necessary to use tables instead so that you can indeed implement efficient partial recalculation. As far as I know. I too am curious about other approaches.
Re: Need to know more about pg_test_fsync utility
On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA wrote: > As you mentioned in question-8, "I'd investigate whether data is being cached > unexpectedly, perhaps indicating that committed transactions be lost in a > system crash event." So, I would like to know that if we configure the disk > for the WALs with read+write disk cache then will it create any performance > issue and show the attached output? Which OS and file system are you running and what kind of storage system is it? If you can turn write caching on and off in your storage layer without affecting the ops/sec that's a pretty good clue that it's respecting the cache control commands that the OS sends. The numbers you showed are reassuringly low. Hmm, I wonder why you have such a low number for non-sync'd writes (the last line). I have a concrete example of using this tool to learn something about an unfamiliar-to-me operating system, from this morning: I saw a system that reported ~6k IOPS for open_datasync and only ~600 IOPS for fdatasync. With some limited knowledge of file systems I expect the latter to involve sending a SYNCHRONIZE CACHE command to flush device caches, and the former to do that, or use a more efficient FUA flag to flush just particular writes and not the whole write cache. I didn't expect it to go 10x faster, so something had to be wrong. With some more investigation and reading I learned that the storage drivers I was using do indeed ignore FUA cache control flags, so that wal_sync_method is not crash safe; if you lose power and come back up, you might lose a bunch of committed transactions from the end of the WAL. I turned off write caches in the storage device, and then open_datasync became a lot slower, matching my expectations. These are primitive tools, but can help to check some assumptions... Obligatory remark: PostgreSQL 9.5 is out of support, please see https://www.postgresql.org/support/versioning/. > I also would like to know is there any best Practice from PostgreSQL which > mentions what is the disk latency required for the WAL & DATA disk? No particular latency is required by PostgreSQL, and I don't have a general answer to this. Is it fast enough for you? What sort of workload is it running, OLTP, reporting, ...?
Re: Need to know more about pg_test_fsync utility
Hi Thomas, Apologies for uploading the wrong screenshot in the attachment. I have uploaded the correct output of pg_test_fsync for your reference . Please find our environment details as below: Infra: Azure IaaS OS: Red Hat Enterprise Linux Server release 7.9 File System: XFS Disk: Premium SSD IOPS: 5000 per Disk Throughput: 200 MB/Sec per Disk Disk Cache: Read+Write Workload: OLTP Actual issue is we are getting slow execution of Commit & Prepared Commit statements only. Execution of these statements is taking 2 to 14 seconds. That's why we are analysing the pg_test_fsync output & FIO for further troubleshooting but unable to find any right directions. On Mon, 13 Dec 2021 at 10:45, Thomas Munro wrote: > On Mon, Dec 13, 2021 at 3:04 PM PGSQL DBA wrote: > > As you mentioned in question-8, "I'd investigate whether data is being > cached unexpectedly, perhaps indicating that committed transactions be lost > in a system crash event." So, I would like to know that if we configure the > disk for the WALs with read+write disk cache then will it create any > performance issue and show the attached output? > > Which OS and file system are you running and what kind of storage > system is it? If you can turn write caching on and off in your > storage layer without affecting the ops/sec that's a pretty good clue > that it's respecting the cache control commands that the OS sends. > The numbers you showed are reassuringly low. Hmm, I wonder why you > have such a low number for non-sync'd writes (the last line). > > I have a concrete example of using this tool to learn something about > an unfamiliar-to-me operating system, from this morning: I saw a > system that reported ~6k IOPS for open_datasync and only ~600 IOPS for > fdatasync. With some limited knowledge of file systems I expect the > latter to involve sending a SYNCHRONIZE CACHE command to flush device > caches, and the former to do that, or use a more efficient FUA flag to > flush just particular writes and not the whole write cache. I didn't > expect it to go 10x faster, so something had to be wrong. With some > more investigation and reading I learned that the storage drivers I > was using do indeed ignore FUA cache control flags, so that > wal_sync_method is not crash safe; if you lose power and come back up, > you might lose a bunch of committed transactions from the end of the > WAL. I turned off write caches in the storage device, and then > open_datasync became a lot slower, matching my expectations. > > These are primitive tools, but can help to check some assumptions... > > Obligatory remark: PostgreSQL 9.5 is out of support, please see > https://www.postgresql.org/support/versioning/. > > > I also would like to know is there any best Practice from PostgreSQL > which mentions what is the disk latency required for the WAL & DATA disk? > > No particular latency is required by PostgreSQL, and I don't have a > general answer to this. Is it fast enough for you? What sort of > workload is it running, OLTP, reporting, ...? >