Re: postgres on physical replica crashes
Hi Greig, just last week I experienced the same situation as you on a 10.3 physical replica (it even has checksums activated), and a few months ago on 9.6 . We used the same resolution as you we, and so far we haven't noticed any problems with data integrity on the replicas. The logs were as follows: 2018-04-13 06:31:16.947 CEST [15603] FATAL: WAL-Receiver-Prozess wird abgebrochen wegen Zeitüberschreitung 2018-04-13 06:31:16.948 CEST [15213] FATAL: invalid memory alloc request size 4280303616 2018-04-13 06:31:16.959 CEST [15212] LOG: Startprozess (PID 15213) beendete mit Status 1 2018-04-13 06:31:16.959 CEST [15212] LOG: aktive Serverprozesse werden abgebrochen 2018-04-13 06:31:16.959 CEST [19838] user@db WARNUNG: Verbindung wird abgebrochen wegen Absturz eines anderen Serverprozesses 2018-04-13 06:31:16.959 CEST [19838] user@db DETAIL: Der Postmaster hat diesen Serverprozess angewiesen, die aktuelle Transaktion zurückzurollen und die Sitzung zu beenden, weil ein anderer Serverprozess abnormal beendet wurde und möglicherweise das Shared Memory verfälscht hat. 2018-04-13 06:31:16.959 CEST [19838] user@db TIPP: In einem Moment sollten Sie wieder mit der Datenbank verbinden und Ihren Befehl wiederholen können. This replica then refused to start up: 2018-04-13 09:25:15.941 CEST [1957] LOG: Standby-Modus eingeschaltet 2018-04-13 09:25:15.947 CEST [1957] LOG: Redo beginnt bei 1C/69C0FF30 2018-04-13 09:25:15.951 CEST [1957] LOG: konsistenter Wiederherstellungszustand erreicht bei 1C/69D9A9C0 2018-04-13 09:25:15.952 CEST [1956] LOG: Datenbanksystem ist bereit, um lesende Verbindungen anzunehmen 2018-04-13 09:25:15.953 CEST [1957] FATAL: invalid memory alloc request size 4280303616 2018-04-13 09:25:15.954 CEST [1956] LOG: Startprozess (PID 1957) beendete mit Status 1 ... until the WAL files from the hot standby's pg_wal were manually removed and re-downloaded from the primary. Unfortunately I did not collect hard evidence, but I think I saw the primary's replication slot's restart point was set to a position /after/ the standby's actual restart location. This time, the error was noticed immediately and the required WAL was still present on the master. A few months ago I experienced the same situation on a 9.6 cluster, but that was not noticed for a long time, and - despite using a replication slot! - the primary had already removed required segments. Fortunately I could get them from a tape backup... Best regards, -hannes Am 2018-04-18 um 18:16 schrieb greigwise: Hello. I've had several instances where postgres on my physical replica under version 9.6.6 is crashing with messages like the following in the logs: 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-04-18 05:43:26 UTC dbname 5acf5e4a.6918 dbuser HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser WARNING: terminating connection because of crash of another server process 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2018-04-18 05:43:26 UTC dbname 5acf5e39.68e5 dbuser HINT: In a moment you should be able to reconnect to the database and repeat your command. 2018-04-18 05:43:27 UTC 5acf5e12.6819 LOG: database system is shut down When this happens, what I've found is that I can go into the pg_xlog directory on the replica, remove all the log files and the postgres will restart and things seem to come back up normally. So, the question is what's going on here... is the log maybe getting corrupt in transmission somehow? Should I be concerned about the viability of my replica after having restarted in the described fashion? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
recovery_target_time and WAL fetch with streaming replication
Hi, what is Postgresql's strategy when to fetch WAL from the master while in streaming replication, and could it be tweaked? I'm using a physical streaming replication slave to have a database lagging behind about one month behind the primary, by setting "recovery_target_time" to the desired point in time. This setting is periodically advanced by a cronjob to allow the replica to roll forward. It's a 10.3-1 install on Debian. It seems that as soon as the slave needs new WAL data to reach the requested target time, it will connect to the primary and fetch /all/ new WAL the primary has available for the slave's slot - up to "now", ignoring the recovery_target_time. So in my setup, the slave will connect to the primary once per month, and download the whole next month's WAL data at once. I do not care on which instance WAL is kept until needed, but I'd really prefer if the transfer (and disk space requirement!) would be more evenly distributed. One option of course would be to use some transfer mechanism external to Postgresql... but so far I'm thinking there must be any easier way? Thanks & best regards, -hannes
Re: recovery_target_time and WAL fetch with streaming replication
Michael, Am 2018-05-13 um 08:23 schrieb Michael Paquier: On Sun, May 13, 2018 at 01:39:48AM +0200, Hannes Erven wrote: what is Postgresql's strategy when to fetch WAL from the master while in streaming replication, and could it be tweaked? Fetching WAL from a primary (or another standby) cannot be directly per se. By definition, any WAL present will e automatically fetched and synced on the standby. I also thought so, but this is not the case (and that's my whole point). When the standby finds sufficient local WAL in respect to its recovery_target_time, it seems it does not even attempt to connect to the primary. But when new WAL is needed, the standby will fetch /all/ WAL present on the master. I'd say, the standby should either: - always connect to the primary and fetch any WAL present - stop fetching/streaming WAL when it is not needed for the current recovery_target So ideally there would be an option to ask for the desired behaviour? For example, what people usually want to be protected from is an infortunate DROP TABLE on the primary run by an administrator to be immediately replayed on the standby, losing the data. Hence delaying when WAL is replayed can offer some protection, and this can be achieved by setting recovery_min_apply_delay in recovery.conf. This will cause WAL records replaying transactions commits to wait for the amount of time specified by this parameter, giving you the time to recover from any failures with a standby which has a controlled synced delta. recovery_min_apply_delay is an int of milliseconds, so the maximum value is approx. 25 days - which is not enough for my requirements. Also, transaction/MVCC visibility will not cover all cases; most important, it cannot protected against TRUNCATE (https://wiki.postgresql.org/wiki/MVCC_violations). One option of course would be to use some transfer mechanism external to Postgresql... but so far I'm thinking there must be any easier way? Another option I can think of here is to use a specific restore_command instead of streaming replication. Simply archive a WAL segment on the primary with some meta-data like the time it was archived, and then allow the standby to recover the segment only after a delta has passed. Yes, but thats far less simple than just setting restore_target_time . Thanks again and best regards, -hannes
Re: postgres large database backup
Am 30.11.22 um 20:01 schrieb Mladen Gogala: On 11/30/22 10:40, Atul Kumar wrote: Hi, I have a 10TB database running on postgres 11 version running on centos 7 "on premises", I need to schedule the backup of this database in a faster way. The scheduled backup will be used for PITR purposes. Well, that depends on your configuration details. Is your DB located on a SAN device or JBOD? You could also use a filesystem that can do atomic snapshots - like ZFS. However, I'm wondering why none of the previous respondents mentioned it? Sure, ZFS may have its own performance implications... but my experience is that it makes "safety snapshots" and "on-demand throw-away testing environments" very convenient to use. Best regards, -hannes
Force Reconnect of streaming replication
Hi, consider a PG10 master center, streaming via internet to another site. The receiving site has a fast primary uplink and much slower backup link. When the primary link goes down, all traffic is routed through the backup connection. The streaming replication's connection drops and is automatically re-establish over the backup link. But when the primary link comes up again, the replication will still use the already established connection via the slow backup link. If I kill the wal_sender process on the master, the slave will immediately reconnect over the "best" available link. But is this really the best way to force PG to re-connect the replication? I guess I could as well kill the wal-receiving process on the slave but still I'm hesitating to put something like "killall pg_walsender" in a cronjob... Is there a better way? Thanks for your opinions, best regards -hannes
Re: Does creating readOnly connections, when possible, free up resources in Postgres?
Hi David, > I saw that when you create a JDBC connection, you can specify > readOnly=true. Would doing so somehow help Postgres manage its other > connections? if you know that a certain connection will be ready-only, you could use a more aggressive pooling strategy. Usually, a connection pool will return a worker thread to the pool when the client closes the connection. pgbouncer for example offers a "statement" pooling strategy, meaning that a worker connection is returned to the pool after every completed SQL statement. That way, a single Postgresql server thread can easily serve many client connections. If a specific usecase fits into a single command, you can even use it for these writes! Of course, you can't use server-side prepared statements, temporary tables, ... or anything else that depends on subsequent statements hitting the same server process. For an application I'm running, we have dozens of GUI clients keeping a JDBC connection open all day checking for updates, and with statement pooling this is handled well by 1-3 server processes (auto-scaled by pgbouncer). Best regards, -hannes
Unused files in the database directory after crashed VACUUM FULL
Hi, I've just had a "VACUUM FULL " crash due to 100% disk usage. Clearly my fault, I was expecting the new table to be small enough. After freeing up space, restarting the cluster and issuing another VACCUM FULL, I noticed that the cluster was way bigger that it should be. In the base// folder, there was a large number of files with one certain number that pg_filenode_relation() could not turn into a relation. As that number was just a bit smaller that the pg_relation_filepath() of the table I was working on, I guess these were the remains of the failed VACUUM FULL operation? I removed those files and a VACCUM ANALYZE of the whole database went fine. So... is this the expected behaviour? Is there a safe procedure how to check/clean up "unnecessary" files in the cluster directory? Version: ('PostgreSQL 10.3 (Debian 10.3-1.pgdg90+1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516, 64-bit') Table definition: ( id serial PRIMARY KEY, data bytea, parent_id integer NOT NULL ) The "data" column values are large enough to be TOASTed. Thanks & best regards, -hannes
Re: Unused files in the database directory after crashed VACUUM FULL
Hi again, Am 10.02.19 um 16:41 schrieb Tom Lane: Hannes Erven writes: I've just had a "VACUUM FULL " crash due to 100% disk usage. Clearly my fault, I was expecting the new table to be small enough. What do you mean by "crash" exactly? A normal transactional failure should've cleaned up orphaned files. I suppose if the kernel decided to kill -9 the vacuum process, that wouldn't happen --- but that's not the usual response to out-of-disk-space. Here's the exact log (the pgadmin process was running the VACCUM FULL): 2019-02-09 23:44:53.516 CET [20341] @/ <> PANIC: could not write to file "pg_wal/xlogtemp.20341": No space left on device 2019-02-09 23:44:53.516 CET [19181] @/ <> LOG: WAL writer process (PID 20341) was terminated by signal 6: Aborted 2019-02-09 23:44:53.516 CET [19181] @/ <> LOG: terminating any other active server processes 2019-02-09 23:44:53.516 CET [20816] postgres@IP/db WARNING: terminating connection because of crash of another serve r process 2019-02-09 23:44:53.516 CET [20816] postgres@IP/db DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-02-09 23:44:53.516 CET [20816] postgres@IP/db HINT: In a moment you should be able to reconnect to the database and repeat your command. 2019-02-09 23:44:53.597 CET [19181] @/ <> LOG: all server processes terminated; reinitializing 2019-02-09 23:44:53.679 CET [11686] @/ <> LOG: database system was interrupted; last known up at 2019-02-09 23:44:40 CET 2019-02-09 23:44:58.761 CET [11686] @/ <> LOG: database system was not properly shut down; automatic recovery in progress 2019-02-09 23:44:58.807 CET [11686] @/ <> LOG: redo starts at 1DA/B3400798 2019-02-09 23:44:58.811 CET [11686] @/ <> FATAL: could not extend file "base/16400/612716_fsm": No space left on device 2019-02-09 23:44:58.811 CET [11686] @/ <> HINT: Check free disk space. 2019-02-09 23:44:58.811 CET [11686] @/ <> CONTEXT: WAL redo at 1DA/B3406310 for Heap/INSERT: off 4 After freeing space and restarting: 2019-02-10 06:28:25.622 CET [14344] @/ <> LOG: database system was interrupted while in recovery at 2019-02-09 23:44:58 CET 2019-02-10 06:28:25.622 CET [14344] @/ <> HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. 2019-02-10 06:28:26.762 CET [14344] @/ <> LOG: database system was not properly shut down; automatic recovery in progress 2019-02-10 06:28:26.831 CET [14344] @/ <> LOG: redo starts at 1DA/B3400798 2019-02-10 06:28:27.624 CET [14344] @/ <> LOG: redo done at 1DA/D5FFF9D8 Is there a safe procedure how to check/clean up "unnecessary" files in the cluster directory? You just described it --- verify with pg_filenode_relation that the file doesn't correspond to any pg_class entry, then manually delete. OK, I see. I came up with this: with d0 as ( select oid from pg_database where datname=current_database() ), d1 as ( select pg_ls_dir as fn, regexp_match(pg_ls_dir, '^([0-9]+)(.*)$') as match from d0, pg_ls_dir('base/'||d0.oid) order by 1 ), d2 as ( select fn, match[1] as base, match[2] as ext from d1 where (fn NOT ILIKE 'pg_%') ), d3 as ( select d.*, pg_filenode_relation(0, base::oid) as relation from d2 d ) select fn, pg_size_pretty((pg_stat_file('base/'||d0.oid||'/'||fn)).size) from d0, d3 where relation is null; fn | pg_size_pretty + 612713 | 25 MB 612718 | 1024 MB 612718.1 | 1024 MB 612718.2 | 180 MB t11_621351 | 96 kB t11_621354 | 0 bytes t11_621356 | 8192 bytes t11_621357 | 160 kB t11_621360 | 0 bytes t11_621362 | 8192 bytes So there's still 612713 and 612718* left to clean up; judging from the numbers and timestamps, they probably belong to other VACUUM FULLs I manually canceled in the pgadmin UI during my experiments. Is this interesting enough to further investigate? If yes, I'll be happy to work on a test case. In any case, I've now learned (how) to check for orphaned files if a database cluster's on-disk size is way off from the sums of pg_total_relation_size() + WAL... Thanks again & best regards, -hannes
Re: Where to store Blobs?
Hi, Am 13.03.19 um 15:28 schrieb Thomas Güttler: Where do you store Blobs? Within PostgreSQL, of course. The system I have in mind stores ZIP and PDF files, usually a few MBs each; we're currently at a total of about 100 GB and there are no evident problems. For this application, it is extremely important that saving/modifying the binaries is part of a larger transaction that completes/fails atomically. Replication is almost real-time and access to the binaries is provided with the same libraries (JDBC/Hibernate) as everything else. Best regards, -hannes
RowLock and multiple transactions
Hi, when "SELECT .. WHERE .. FOR NO KEY UPDATE" is used synchronize access, and the transaction holding the lock completes, how does PostgreSQL decide /which one/ of multiple waiting transactions will the lock be granted to next? In my testing (on Ubuntu 16.1-1.pgdg20.04+1, 64bit) with a real-world application (that acquires multiple locks on a number of relations) it seems that it is always the transaction that attempted to lock _last_ ... ? I thought that would most probably be random, or if it was not, the order would have been explained in the docs? Thank you for any insights... Best regards -hannes
Big variance in execution times of simple queries
Hi community, I'm looking at a "SELECT * FROM pg_stat_statements" output and am puzzled by the huge differences between min/max_exec_time even for simple queries. The most extreme example is probably the statement used by the application's connection health check: SELECT 1 min=0.001, mean=0.00386, max=36.812 Other statements with huge variances include: SET application_name=$1 min=0.002, mean=0.005, max=9.177 SELECT * FROM table WHERE id=$1 (where ID is the primary key column; table has 0.5M rows and is frequently vacuum analyzed) min=0.010, mean=0.260, max=12338.665 According to the system's monitoring, there is no pressure on any resource (cpu/mem/io). It's 13.5-2pgdg20.04+1 on Ubuntu 20.4; the VM has 12 cpus/16GB memory, ceph-based SSD storage (latency ~1.5ms), and runs on max_connections=100 with usually 25-40 processes being connected. Is this to be expected? Is there something I can watch out or monitor for? Thank you for any insights... Best regards -hannes
Re: array_agg and/or =ANY doesn't appear to be functioning as I expect
Hi Rhys, Am 2018-01-21 um 02:42 schrieb Rhys A.D. Stewart: Greetings All, I'm having an issue which is very perplexing. The having clause in a query doesn't appear to be working as I expect it. Either that or my understanding of array_agg() is flawed. > > [...] with listing as ( select start_vid, end_vid, array_agg(node order by path_seq) node_array, array_agg(edge order by path_seq) edge_array from confounded.dataset group by start_vid,end_vid having true =ALL (array_agg(truth)) ) select count(*) from confounded.dataset where node in (select distinct unnest(node_array) from listing) and truth = false; I would expect the above query to return 0 rows. the answer is in your data: "node" is not a UNIQUE field, and there are node values with multiple rows. e.g. node=977 has one row with truth=true and one with truth=false. So what your second query really does is "select all node values from listing for which another entry with truth=false exists in the dataset". Presuming that "seq" is a primary key [although not declared], you probably meant to restrict your query on that. Best regards, -hannes