Re: REINDEX vs VACUUM
+ pgsql-admin Would appreciate any insights. Thanks On Sat, Dec 31, 2022 at 1:04 PM Hao Zhang wrote: > What is the difference between reindex and vacuum's impact on index file? > I deleted an indexed row. Running either vacuum or reindex shows the index > entry for the row is removed from the index page. I was under the > impression that only reindex will remove dangling index entries. I am > guessing that vacuum will not shrink the index file and will only add > deleted index entries in the free space file for the index? But REINDEX is > recreating the index file from scratch so it is like vacuum full for index? > > Thanks >
Re: REINDEX vs VACUUM
Here is my understanding: REINDEX recreates the index from scratch, using the data stored in the underlying table. It is the same as dropping and recreating the index manually, with regard to the impact on the index file. It can free up physical space in the file system. REINDEX will not vacuum the index. VACUUM does many different things. One of them is vacuuming indexes (for the underlying table that is being vacuumed). VACUUM will remove index entries that are pointing to dead rows in the underlying table. VACUUM will not rebuild the entire index. VACUUM recycles free index blocks (using FSM), but it does not (usually) free up space for the file system physically. VACUUM FULL does that. VACUUM FULL will vacuum the index, but it will do it by making a copy of the index (file) and reorganizing its content in order to free up space physically. In that regard, it is like REINDEX. Both VACUUM FULL and REINDEX will block reads from the index during the process (by taking an ACCESS EXCLUSIVE lock). Regards, tamas 2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta: What is the difference between reindex and vacuum's impact on index file? I deleted an indexed row. Running either vacuum or reindex shows the index entry for the row is removed from the index page. I was under the impression that only reindex will remove dangling index entries. I am guessing that vacuum will not shrink the index file and will only add deleted index entries in the free space file for the index? But REINDEX is recreating the index file from scratch so it is like vacuum full for index? Thanks
What is the best setup for distributed and fault-tolerant PG database?
Dear PG experts, I am new to postgres, and I am also not a DBA. I am a solo developer who is trying to evaluate what database to use for my hybrid multi-tenancy sub-apps i.e. users of the application will be authorised to use part or whole of the application based on their authorisation levels. This delineation of user access has to also be supported by the database, if possible. Also, for audit purposes the data is append only. And the design is based on just two tables(vertices and edges) to emulate a document-oriented(jsonb) graph structure. Postgres is the database I am leaning towards for this project. But as I am not a DBA and also a solo developer, I am trying to understand how I can spend less time managing the DB and more time developing the application. I would like to have a distributed and fault-tolerant DB setup with multiple read and write nodes with little to no configuration on my part, if possible. I am looking for a self-hosted open source solution. Is this possible with PG? What is the best way to achieve this for a non-DBA solo developer like me? Thanks and kind regards
Re: REINDEX vs VACUUM
I don't think VACUUM FULL (copy the table, create new indices and other metadata all in one command) actually vacuums tables. It's a misleading name. Something like REBUILD TABLE would be a better name. On 1/4/23 07:25, Rébeli-Szabó Tamás wrote: Here is my understanding: REINDEX recreates the index from scratch, using the data stored in the underlying table. It is the same as dropping and recreating the index manually, with regard to the impact on the index file. It can free up physical space in the file system. REINDEX will not vacuum the index. VACUUM does many different things. One of them is vacuuming indexes (for the underlying table that is being vacuumed). VACUUM will remove index entries that are pointing to dead rows in the underlying table. VACUUM will not rebuild the entire index. VACUUM recycles free index blocks (using FSM), but it does not (usually) free up space for the file system physically. VACUUM FULL does that. VACUUM FULL will vacuum the index, but it will do it by making a copy of the index (file) and reorganizing its content in order to free up space physically. In that regard, it is like REINDEX. Both VACUUM FULL and REINDEX will block reads from the index during the process (by taking an ACCESS EXCLUSIVE lock). Regards, tamas 2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta: What is the difference between reindex and vacuum's impact on index file? I deleted an indexed row. Running either vacuum or reindex shows the index entry for the row is removed from the index page. I was under the impression that only reindex will remove dangling index entries. I am guessing that vacuum will not shrink the index file and will only add deleted index entries in the free space file for the index? But REINDEX is recreating the index file from scratch so it is like vacuum full for index? Thanks -- Born in Arizona, moved to Babylonia.
curious postgres (crash) recovery behavior
We are using Crunchy PGO which uses "pgbackrest". Over Christmas we had a disk full error on the "pgbackrest" repo followed by a disk full error on the PostgreSQL instance pod. Unfortunately, a colleague then deleted the current "pg_wal" directory on the instance pod. So we had to do a point-in-time recovery to overcome this situation. we started a PITR to 2022-12-23 01:34 based on the following "pgbackrest" backup: sh-4.4$ pgbackrest info stanza: db status: ok cipher: none db (current) wal archive min/max (13): 000B010B00C1/000C0111000A full backup: 20221222-230004F timestamp start/stop: 2022-12-22 23:00:04 / 2022-12-23 01:32:17 wal start/stop: 000B010B00C1 / 000B010C005C database size: 46.3GB, database backup size: 46.3GB repo1: backup set size: 17.6GB, backup size: 17.6GB During the point-in-time recovery, the "pgbackrest" pod terminated incorrectly with an error "pg_ctl: server did not start in time". There is a known PGO bug on this (pg_ctl default timeout of 60 secs cannot be changed). PGO started the instance pod anyway in following, which leads to a regular PostgreSQL crash recovery ending with "last completed transaction was at log time 2022-12-23 20:52:29.584555+01": ... 2023-01-04 15:26:35 CET : =>@ : 94-4=>63b58c9b.5e : 0 LOG: starting PostgreSQL 13.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit 2023-01-04 15:26:35 CET : =>@ : 94-5=>63b58c9b.5e : 0 LOG: listening on IPv4 address "0.0.0.0", port 5432 2023-01-04 15:26:35 CET : =>@ : 94-6=>63b58c9b.5e : 0 LOG: listening on IPv6 address "::", port 5432 2023-01-04 15:26:35 CET : =>@ : 94-7=>63b58c9b.5e : 0 LOG: listening on Unix socket "/tmp/postgres/.s.PGSQL.5432" 2023-01-04 15:26:35 CET : =>@ : 108-1=>63b58c9b.6c : 0 LOG: database system was interrupted while in recovery at log time 2022-12-23 01:06:58 CET 2023-01-04 15:26:35 CET : =>@ : 108-2=>63b58c9b.6c : 0 HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2023-01-04 15:26:35 CET : =>@ : 108-3=>63b58c9b.6c : 0 LOG: entering standby mode 2023-01-04 15:26:35 CET : =>@ : 108-4=>63b58c9b.6c : 0 LOG: restored log file "000B.history" from archive 2023-01-04 15:26:35 CET : =>@ : 108-5=>63b58c9b.6c : 0 LOG: restored log file "000B010C000C" from archive ... 2023-01-04 15:30:06 CET : =>@ : 108-1198=>63b58c9b.6c : 0 LOG: restored log file "000B011000AD" from archive 2023-01-04 15:30:06 CET : =>@ : 108-1199=>63b58c9b.6c : 0 LOG: received promote request 2023-01-04 15:30:06 CET : =>@ : 108-1200=>63b58c9b.6c : 0 LOG: redo done at 110/AD0005B8 2023-01-04 15:30:06 CET : =>@ : 108-1201=>63b58c9b.6c : 0 LOG: last completed transaction was at log time 2022-12-23 20:52:29.584555+01 2023-01-04 15:30:07 CET : =>@ : 108-1202=>63b58c9b.6c : 0 LOG: restored log file "000B011000AD" from archive 2023-01-04 15:30:07 CET : =>@ : 108-1203=>63b58c9b.6c : 0 LOG: selected new timeline ID: 12 2023-01-04 15:30:07 CET : =>@ : 108-1204=>63b58c9b.6c : 0 LOG: archive recovery complete 2023-01-04 15:30:07 CET : =>@ : 108-1205=>63b58c9b.6c : 0 LOG: restored log file "000B.history" from archive 2023-01-04 15:30:07 CET : =>@ : 94-9=>63b58c9b.5e : 0 LOG: database system is ready to accept connections ... QUESTION: = I wondered how the crash recovery can run to 2022-12-23 20:52:29.584555 and simply open the database? When I got it right the full backup itself would be consistent with "wal stop 000B010C005C". PostgreSQL additionally added some wal files until 000B011000AD (2022-12-23 20:52:29.584555) because it didn't find any more before the recovery target of 2022-12-23 01:34. Am I right or did I miss something?
Re: REINDEX vs VACUUM
I have looked more into it and have found that VACUUM FULL (and CLUSTER) does in fact rebuild indexes, see for example: https://github.com/postgres/postgres/blob/c8e1ba736b2b9e8c98d37a5b77c4ed31baf94147/src/backend/commands/cluster.c#L1463 I have also run some tests and have come to understand that REINDEX seems to "vacuum" the index in the sense that a subsequent VACUUM on the underlying table will not find any removable entries in the index. Ron is right, the term "vacuum" may be misleading. 2023. 01. 04. 16:34 keltezéssel, Ron írta: I don't think VACUUM FULL (copy the table, create new indices and other metadata all in one command) actually vacuums tables. It's a misleading name. Something like REBUILD TABLE would be a better name. On 1/4/23 07:25, Rébeli-Szabó Tamás wrote: Here is my understanding: REINDEX recreates the index from scratch, using the data stored in the underlying table. It is the same as dropping and recreating the index manually, with regard to the impact on the index file. It can free up physical space in the file system. REINDEX will not vacuum the index. VACUUM does many different things. One of them is vacuuming indexes (for the underlying table that is being vacuumed). VACUUM will remove index entries that are pointing to dead rows in the underlying table. VACUUM will not rebuild the entire index. VACUUM recycles free index blocks (using FSM), but it does not (usually) free up space for the file system physically. VACUUM FULL does that. VACUUM FULL will vacuum the index, but it will do it by making a copy of the index (file) and reorganizing its content in order to free up space physically. In that regard, it is like REINDEX. Both VACUUM FULL and REINDEX will block reads from the index during the process (by taking an ACCESS EXCLUSIVE lock). Regards, tamas 2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta: What is the difference between reindex and vacuum's impact on index file? I deleted an indexed row. Running either vacuum or reindex shows the index entry for the row is removed from the index page. I was under the impression that only reindex will remove dangling index entries. I am guessing that vacuum will not shrink the index file and will only add deleted index entries in the free space file for the index? But REINDEX is recreating the index file from scratch so it is like vacuum full for index? Thanks
Corrupt WAL in replication slot
In a replication scheme, one of the satellite databases that replicates against a consolidated database has one of the WAL files corrupted, more precisely the 0001009 file. I try the pg_replication_slot_advance() function to move pg_lsn to skip the error, but it tells me there is a data inconsistency. You must control the data consistency even though I want to skip them. As an observation, the base checkpoint is much further ahead. Is it possible to regain control of the replication slot? -- Este correo electrónico ha sido analizado en busca de virus por el software antivirus de Avast. www.avast.com
Re: curious postgres (crash) recovery behavior
I think your log shows a continued archive recovery (PITR), not a crash recovery. For a crash recovery (automatic recovery), you would see something like this in the server log: LOG: database system was interrupted; last known up at 2022-12-23 20:50:13 CET LOG: database system was not properly shut down; automatic recovery in progress In your case, the server started in archive recovery mode (standby mode) again, restored archived WAL, and recovered the cluster from it. You must have set the recovery_target_action parameter to 'promote' (probably via pgbackrest's --target-action parameter), because the server started in production mode (read-write mode) after a successful archive recovery. The server started in standby mode again because the recovery had not completed, so the standby.signal file had not been deleted from the data directory. The server knew that it was an interrupted recovery by looking at the database cluster state in the control file ("in archive recovery" instead of "shutdown in recovery").
Re: What is the best setup for distributed and fault-tolerant PG database?
On 1/4/23 06:26, Age Apache wrote: Dear PG experts, I am new to postgres, and I am also not a DBA. I am a solo developer who is trying to evaluate what database to use for my hybrid multi-tenancy sub-apps i.e. users of the application will be authorised to use part or whole of the application based on their authorisation levels. This delineation of user access has to also be supported by the database, if possible. Also, for audit purposes the data is append only. And the design is based on just two tables(vertices and edges) to emulate a document-oriented(jsonb) graph structure. Postgres is the database I am leaning towards for this project. But as I am not a DBA and also a solo developer, I am trying to understand how I can spend less time managing the DB and more time developing the application. I would like to have a distributed and fault-tolerant DB setup with multiple read and write nodes with little to no configuration on my part, if possible. I am looking for a self-hosted open source solution. Is this possible with PG? What is the best way to achieve this for a non-DBA solo developer like me? Thanks and kind regards None of the experts chimed in so I ante up my $0.02. It won't be possible unless you become a serious DBA _and_ solo (full stack) developer. Or you pay for db support.
Re: What is the best setup for distributed and fault-tolerant PG database?
What Rob said... plus, I would urge you to give some more thought to "for audit purposes the data is append only". If your application is ever successful, non-insignificant storage costs are something you'll need to deal with sooner or later. Anyway, what you ask is certainly achievable, but not without sustained effort. IMO your options are: spend the time to learn on your own with much reading + trial and error; pay somebody to set it up for you; or, cross the high-availability bridge after you've got something of substance developed, app-wise, on a single local DB. Best of luck! On Wed, Jan 4, 2023, 6:54 p.m. Rob Sargent wrote: > On 1/4/23 06:26, Age Apache wrote: > > Dear PG experts, > > I am new to postgres, and I am also not a DBA. I am a solo developer who > is trying to evaluate what database to use for my hybrid multi-tenancy > sub-apps i.e. users of the application will be authorised to use part or > whole of the application based on their authorisation levels. This > delineation of user access has to also be supported by the database, if > possible. Also, for audit purposes the data is append only. And the design > is based on just two tables(vertices and edges) to emulate a > document-oriented(jsonb) graph structure. > > Postgres is the database I am leaning towards for this project. But as I > am not a DBA and also a solo developer, I am trying to understand how I can > spend less time managing the DB and more time developing the application. I > would like to have a distributed and fault-tolerant DB setup with multiple > read and write nodes with little to no configuration on my part, if > possible. I am looking for a self-hosted open source solution. > > Is this possible with PG? What is the best way to achieve this for a > non-DBA solo developer like me? > > Thanks and kind regards > > None of the experts chimed in so I ante up my $0.02. > > It won't be possible unless you become a serious DBA _and_ solo (full > stack) developer. Or you pay for db support. > > >
Re: What is the best setup for distributed and fault-tolerant PG database?
>>> age.apa...@gmail.com wrote: >>> >>> I am new to postgres, and I am also not a DBA. I am a solo developer who is >>> trying to evaluate what database to use for my hybrid multi-tenancy >>> sub-apps i.e. users of the application will be authorised to use part or >>> whole of the application based on their authorisation levels. This >>> delineation of user access has to also be supported by the database, if >>> possible. Also, for audit purposes the data is append only. And the design >>> is based on just two tables(vertices and edges) to emulate a >>> document-oriented(jsonb) graph structure. >>> >>> Postgres is the database I am leaning towards for this project. But as I am >>> not a DBA and also a solo developer, I am trying to understand how I can >>> spend less time managing the DB and more time developing the application. I >>> would like to have a distributed and fault-tolerant DB setup with multiple >>> read and write nodes with little to no configuration on my part, if >>> possible. I am looking for a self-hosted open source solution. >>> >>> Is this possible with PG? What is the best way to achieve this for a >>> non-DBA solo developer like me? >> >> robjsarg...@gmail.com wrote: >> >> None of the experts chimed in so I ante up my $0.02. It won't be possible >> unless you become a serious DBA _and_ solo (full stack) developer. Or you >> pay for db support. > > saul.perd...@gmail.com wrote: > > What Rob said... plus, I would urge you to give some more thought to "for > audit purposes the data is append only". If your application is ever > successful, non-insignificant storage costs are something you'll need to deal > with sooner or later. > > Anyway, what you ask is certainly achievable, but not without sustained > effort. IMO your options are: spend the time to learn on your own with much > reading + trial and error; pay somebody to set it up for you; or, cross the > high-availability bridge after you've got something of substance developed, > app-wise, on a single local DB. There’s always Internet search. Mention as many key phrases as you see fit. For example: > Fully managed cloud service for highly available, fault tolerant, > Postgres-compatible distributed SQL database Look at my email address. That outfit is among the hits. So full disclosure is done. But, as they say, “other services are available”. So I don’t think that my suggested search string is inappropriate for a list like this.