Re: REINDEX vs VACUUM

2023-01-04 Thread Hao Zhang
+ pgsql-admin
Would appreciate any insights. Thanks

On Sat, Dec 31, 2022 at 1:04 PM Hao Zhang  wrote:

> What is the difference between reindex and vacuum's impact on index file?
> I deleted an indexed row. Running either vacuum or reindex shows the index
> entry for the row is removed from the index page. I was under the
> impression that only reindex will remove dangling index entries. I am
> guessing that vacuum will not shrink the index file and will only add
> deleted index entries in the free space file for the index? But REINDEX is
> recreating the index file from scratch so it is like vacuum full for index?
>
> Thanks
>


Re: REINDEX vs VACUUM

2023-01-04 Thread Rébeli-Szabó Tamás

Here is my understanding:

REINDEX recreates the index from scratch, using the data stored in the 
underlying table. It is the same as dropping and recreating the index 
manually, with regard to the impact on the index file. It can free up 
physical space in the file system. REINDEX will not vacuum the index.


VACUUM does many different things. One of them is vacuuming indexes (for 
the underlying table that is being vacuumed). VACUUM will remove index 
entries that are pointing to dead rows in the underlying table. VACUUM 
will not rebuild the entire index.


VACUUM recycles free index blocks (using FSM), but it does not (usually) 
free up space for the file system physically. VACUUM FULL does that.


VACUUM FULL will vacuum the index, but it will do it by making a copy of 
the index (file) and reorganizing its content in order to free up space 
physically. In that regard, it is like REINDEX. Both VACUUM FULL and 
REINDEX will block reads from the index during the process (by taking an 
ACCESS EXCLUSIVE lock).


Regards,

tamas

2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta:
What is the difference between reindex and vacuum's impact on index 
file? I deleted an indexed row. Running either vacuum or reindex shows 
the index entry for the row is removed from the index page. I was 
under the impression that only reindex will remove dangling index 
entries. I am guessing that vacuum will not shrink the index file and 
will only add deleted index entries in the free space file for the 
index? But REINDEX is recreating the index file from scratch so it is 
like vacuum full for index?


Thanks





What is the best setup for distributed and fault-tolerant PG database?

2023-01-04 Thread Age Apache
Dear PG experts,

I am new to postgres, and I am also not a DBA. I am a solo developer who is
trying to evaluate what database to use for my hybrid multi-tenancy
sub-apps i.e. users of the application will be authorised to use part or
whole of the application based on their authorisation levels. This
delineation of user access has to also be supported by the database, if
possible. Also, for audit purposes the data is append only. And the design
is based on just two tables(vertices and edges) to emulate a
document-oriented(jsonb) graph structure.

Postgres is the database I am leaning towards for this project. But as I am
not a DBA and also a solo developer, I am trying to understand how I can
spend less time managing the DB and more time developing the application. I
would like to have a distributed and fault-tolerant DB setup with multiple
read and write nodes with little to no configuration on my part, if
possible. I am looking for a self-hosted open source solution.

Is this possible with PG? What is the best way to achieve this for a
non-DBA solo developer like me?

Thanks and kind regards


Re: REINDEX vs VACUUM

2023-01-04 Thread Ron
I don't think VACUUM FULL (copy the table, create new indices and other 
metadata all in one command) actually vacuums tables.  It's a misleading name.


Something like REBUILD TABLE would be a better name.

On 1/4/23 07:25, Rébeli-Szabó Tamás wrote:

Here is my understanding:

REINDEX recreates the index from scratch, using the data stored in the 
underlying table. It is the same as dropping and recreating the index 
manually, with regard to the impact on the index file. It can free up 
physical space in the file system. REINDEX will not vacuum the index.


VACUUM does many different things. One of them is vacuuming indexes (for 
the underlying table that is being vacuumed). VACUUM will remove index 
entries that are pointing to dead rows in the underlying table. VACUUM 
will not rebuild the entire index.


VACUUM recycles free index blocks (using FSM), but it does not (usually) 
free up space for the file system physically. VACUUM FULL does that.


VACUUM FULL will vacuum the index, but it will do it by making a copy of 
the index (file) and reorganizing its content in order to free up space 
physically. In that regard, it is like REINDEX. Both VACUUM FULL and 
REINDEX will block reads from the index during the process (by taking an 
ACCESS EXCLUSIVE lock).


Regards,

tamas

2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta:
What is the difference between reindex and vacuum's impact on index file? 
I deleted an indexed row. Running either vacuum or reindex shows the 
index entry for the row is removed from the index page. I was under the 
impression that only reindex will remove dangling index entries. I am 
guessing that vacuum will not shrink the index file and will only add 
deleted index entries in the free space file for the index? But REINDEX 
is recreating the index file from scratch so it is like vacuum full for 
index?


Thanks





--
Born in Arizona, moved to Babylonia.




curious postgres (crash) recovery behavior

2023-01-04 Thread Zwettler Markus (OIZ)
We are using Crunchy PGO which uses "pgbackrest".
Over Christmas we had a disk full error on the "pgbackrest" repo followed by a 
disk full error on the PostgreSQL instance pod.
Unfortunately, a colleague then deleted the current "pg_wal" directory on the 
instance pod.
So we had to do a point-in-time recovery to overcome this situation.



we started a PITR to 2022-12-23 01:34 based on the following "pgbackrest" 
backup:

sh-4.4$ pgbackrest info
stanza: db
status: ok
cipher: none

db (current)
wal archive min/max (13): 
000B010B00C1/000C0111000A

full backup: 20221222-230004F
timestamp start/stop: 2022-12-22 23:00:04 / 2022-12-23 01:32:17
wal start/stop: 000B010B00C1 / 000B010C005C
database size: 46.3GB, database backup size: 46.3GB
repo1: backup set size: 17.6GB, backup size: 17.6GB



During the point-in-time recovery, the "pgbackrest" pod terminated incorrectly 
with an error "pg_ctl: server did not start in time".
There is a known PGO bug on this (pg_ctl default timeout of 60 secs cannot be 
changed).


PGO started the instance pod anyway in following, which leads to a regular 
PostgreSQL crash recovery ending with "last completed transaction was at log 
time 2022-12-23 20:52:29.584555+01":
...
2023-01-04 15:26:35 CET : =>@ : 94-4=>63b58c9b.5e : 0 LOG:  starting 
PostgreSQL 13.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 
(Red Hat 8.5.0-10), 64-bit
2023-01-04 15:26:35 CET : =>@ : 94-5=>63b58c9b.5e : 0 LOG:  listening on 
IPv4 address "0.0.0.0", port 5432
2023-01-04 15:26:35 CET : =>@ : 94-6=>63b58c9b.5e : 0 LOG:  listening on 
IPv6 address "::", port 5432
2023-01-04 15:26:35 CET : =>@ : 94-7=>63b58c9b.5e : 0 LOG:  listening on 
Unix socket "/tmp/postgres/.s.PGSQL.5432"
2023-01-04 15:26:35 CET : =>@ : 108-1=>63b58c9b.6c : 0 LOG:  database 
system was interrupted while in recovery at log time 2022-12-23 01:06:58 CET
2023-01-04 15:26:35 CET : =>@ : 108-2=>63b58c9b.6c : 0 HINT:  If this has 
occurred more than once some data might be corrupted and you might need to 
choose an earlier recovery target.
2023-01-04 15:26:35 CET : =>@ : 108-3=>63b58c9b.6c : 0 LOG:  entering 
standby mode
2023-01-04 15:26:35 CET : =>@ : 108-4=>63b58c9b.6c : 0 LOG:  restored log 
file "000B.history" from archive
2023-01-04 15:26:35 CET : =>@ : 108-5=>63b58c9b.6c : 0 LOG:  restored log 
file "000B010C000C" from archive
...
2023-01-04 15:30:06 CET : =>@ : 108-1198=>63b58c9b.6c : 0 LOG:  restored 
log file "000B011000AD" from archive
2023-01-04 15:30:06 CET : =>@ : 108-1199=>63b58c9b.6c : 0 LOG:  received 
promote request
2023-01-04 15:30:06 CET : =>@ : 108-1200=>63b58c9b.6c : 0 LOG:  redo done 
at 110/AD0005B8
2023-01-04 15:30:06 CET : =>@ : 108-1201=>63b58c9b.6c : 0 LOG:  last 
completed transaction was at log time 2022-12-23 20:52:29.584555+01
2023-01-04 15:30:07 CET : =>@ : 108-1202=>63b58c9b.6c : 0 LOG:  restored 
log file "000B011000AD" from archive
2023-01-04 15:30:07 CET : =>@ : 108-1203=>63b58c9b.6c : 0 LOG:  selected 
new timeline ID: 12
2023-01-04 15:30:07 CET : =>@ : 108-1204=>63b58c9b.6c : 0 LOG:  archive 
recovery complete
2023-01-04 15:30:07 CET : =>@ : 108-1205=>63b58c9b.6c : 0 LOG:  restored 
log file "000B.history" from archive
2023-01-04 15:30:07 CET : =>@ : 94-9=>63b58c9b.5e : 0 LOG:  database system 
is ready to accept connections
...



QUESTION:
=
I wondered how the crash recovery can run to 2022-12-23 20:52:29.584555 and 
simply open the database?

When I got it right the full backup itself would be consistent with "wal stop 
000B010C005C".
PostgreSQL additionally added some wal files until 000B011000AD 
(2022-12-23 20:52:29.584555) because it didn't find any more before the 
recovery target of 2022-12-23 01:34.

Am I right or did I miss something?




Re: REINDEX vs VACUUM

2023-01-04 Thread Rébeli-Szabó Tamás
I have looked more into it and have found that VACUUM FULL (and CLUSTER) 
does in fact rebuild indexes, see for example: 
https://github.com/postgres/postgres/blob/c8e1ba736b2b9e8c98d37a5b77c4ed31baf94147/src/backend/commands/cluster.c#L1463


I have also run some tests and have come to understand that REINDEX 
seems to "vacuum" the index in the sense that a subsequent VACUUM on the 
underlying table will not find any removable entries in the index.


Ron is right, the term "vacuum" may be misleading.

2023. 01. 04. 16:34 keltezéssel, Ron írta:
I don't think VACUUM FULL (copy the table, create new indices and 
other metadata all in one command) actually vacuums tables.  It's a 
misleading name.


Something like REBUILD TABLE would be a better name.

On 1/4/23 07:25, Rébeli-Szabó Tamás wrote:

Here is my understanding:

REINDEX recreates the index from scratch, using the data stored in 
the underlying table. It is the same as dropping and recreating the 
index manually, with regard to the impact on the index file. It can 
free up physical space in the file system. REINDEX will not vacuum 
the index.


VACUUM does many different things. One of them is vacuuming indexes 
(for the underlying table that is being vacuumed). VACUUM will remove 
index entries that are pointing to dead rows in the underlying table. 
VACUUM will not rebuild the entire index.


VACUUM recycles free index blocks (using FSM), but it does not 
(usually) free up space for the file system physically. VACUUM FULL 
does that.


VACUUM FULL will vacuum the index, but it will do it by making a copy 
of the index (file) and reorganizing its content in order to free up 
space physically. In that regard, it is like REINDEX. Both VACUUM 
FULL and REINDEX will block reads from the index during the process 
(by taking an ACCESS EXCLUSIVE lock).


Regards,

tamas

2022. 12. 31. 22:04 keltezéssel, Hao Zhang írta:
What is the difference between reindex and vacuum's impact on index 
file? I deleted an indexed row. Running either vacuum or reindex 
shows the index entry for the row is removed from the index page. I 
was under the impression that only reindex will remove dangling 
index entries. I am guessing that vacuum will not shrink the index 
file and will only add deleted index entries in the free space file 
for the index? But REINDEX is recreating the index file from scratch 
so it is like vacuum full for index?


Thanks










Corrupt WAL in replication slot

2023-01-04 Thread Fontana Daniel (Desartec S.R.L.)
In a replication scheme, one of the satellite databases that replicates
against a consolidated database has one of the WAL files corrupted, more
precisely the 0001009 file.

I try the pg_replication_slot_advance() function to move pg_lsn to skip the
error, but it tells me there is a data inconsistency. You must control the
data consistency even though I want to skip them.

As an observation, the base checkpoint is much further ahead.

Is it possible to regain control of the replication slot?


--
Este correo electrónico ha sido analizado en busca de virus por el software 
antivirus de Avast.
www.avast.com




Re: curious postgres (crash) recovery behavior

2023-01-04 Thread Rébeli-Szabó Tamás
I think your log shows a continued archive recovery (PITR), not a crash 
recovery.



For a crash recovery (automatic recovery), you would see something like 
this in the server log:


LOG:  database system was interrupted; last known up at 2022-12-23 
20:50:13 CET
LOG:  database system was not properly shut down; automatic recovery in 
progress


In your case, the server started in archive recovery mode (standby mode) 
again, restored archived WAL, and recovered the cluster from it.


You must have set the recovery_target_action parameter to 'promote' 
(probably via pgbackrest's --target-action parameter), because the 
server started in production mode (read-write mode) after a successful 
archive recovery.


The server started in standby mode again because the recovery had not 
completed, so the standby.signal file had not been deleted from the data 
directory.


The server knew that it was an interrupted recovery by looking at the 
database cluster state in the control file ("in archive recovery" 
instead of "shutdown in recovery").






Re: What is the best setup for distributed and fault-tolerant PG database?

2023-01-04 Thread Rob Sargent

On 1/4/23 06:26, Age Apache wrote:

Dear PG experts,

I am new to postgres, and I am also not a DBA. I am a solo developer 
who is trying to evaluate what database to use for my hybrid 
multi-tenancy sub-apps i.e. users of the application will be 
authorised to use part or whole of the application based on their 
authorisation levels. This delineation of user access has to also be 
supported by the database, if possible. Also, for audit purposes the 
data is append only. And the design is based on just two 
tables(vertices and edges) to emulate a document-oriented(jsonb) graph 
structure.


Postgres is the database I am leaning towards for this project. But as 
I am not a DBA and also a solo developer, I am trying to understand 
how I can spend less time managing the DB and more time developing the 
application. I would like to have a distributed and fault-tolerant DB 
setup with multiple read and write nodes with little to no 
configuration on my part, if possible. I am looking for a self-hosted 
open source solution.


Is this possible with PG? What is the best way to achieve this for a 
non-DBA solo developer like me?


Thanks and kind regards

None of the experts chimed in so I ante up my $0.02.

It won't be possible unless you become a serious DBA _and_ solo (full 
stack) developer.  Or you pay for db support.




Re: What is the best setup for distributed and fault-tolerant PG database?

2023-01-04 Thread Saul Perdomo
What Rob said... plus, I would urge you to give some more thought to "for
audit purposes the data is append only". If your application is ever
successful, non-insignificant storage costs are something you'll need to
deal with sooner or later.

Anyway, what you ask is certainly achievable, but not without sustained
effort. IMO your options are: spend the time to learn on your own with much
reading + trial and error; pay somebody to set it up for you; or, cross the
high-availability bridge after you've got something of substance developed,
app-wise, on a single local DB.

Best of luck!


On Wed, Jan 4, 2023, 6:54 p.m. Rob Sargent  wrote:

> On 1/4/23 06:26, Age Apache wrote:
>
> Dear PG experts,
>
> I am new to postgres, and I am also not a DBA. I am a solo developer who
> is trying to evaluate what database to use for my hybrid multi-tenancy
> sub-apps i.e. users of the application will be authorised to use part or
> whole of the application based on their authorisation levels. This
> delineation of user access has to also be supported by the database, if
> possible. Also, for audit purposes the data is append only. And the design
> is based on just two tables(vertices and edges) to emulate a
> document-oriented(jsonb) graph structure.
>
> Postgres is the database I am leaning towards for this project. But as I
> am not a DBA and also a solo developer, I am trying to understand how I can
> spend less time managing the DB and more time developing the application. I
> would like to have a distributed and fault-tolerant DB setup with multiple
> read and write nodes with little to no configuration on my part, if
> possible. I am looking for a self-hosted open source solution.
>
> Is this possible with PG? What is the best way to achieve this for a
> non-DBA solo developer like me?
>
> Thanks and kind regards
>
> None of the experts chimed in so I ante up my $0.02.
>
> It won't be possible unless you become a serious DBA _and_ solo (full
> stack) developer.  Or you pay for db support.
>
>
>


Re: What is the best setup for distributed and fault-tolerant PG database?

2023-01-04 Thread Bryn Llewellyn
>>> age.apa...@gmail.com wrote:
>>> 
>>> I am new to postgres, and I am also not a DBA. I am a solo developer who is 
>>> trying to evaluate what database to use for my hybrid multi-tenancy 
>>> sub-apps i.e. users of the application will be authorised to use part or 
>>> whole of the application based on their authorisation levels. This 
>>> delineation of user access has to also be supported by the database, if 
>>> possible. Also, for audit purposes the data is append only. And the design 
>>> is based on just two tables(vertices and edges) to emulate a 
>>> document-oriented(jsonb) graph structure.
>>> 
>>> Postgres is the database I am leaning towards for this project. But as I am 
>>> not a DBA and also a solo developer, I am trying to understand how I can 
>>> spend less time managing the DB and more time developing the application. I 
>>> would like to have a distributed and fault-tolerant DB setup with multiple 
>>> read and write nodes with little to no configuration on my part, if 
>>> possible. I am looking for a self-hosted open source solution.
>>> 
>>> Is this possible with PG? What is the best way to achieve this for a 
>>> non-DBA solo developer like me?
>> 
>> robjsarg...@gmail.com wrote:
>> 
>> None of the experts chimed in so I ante up my $0.02. It won't be possible 
>> unless you become a serious DBA _and_ solo (full stack) developer.  Or you 
>> pay for db support.
> 
> saul.perd...@gmail.com wrote:
> 
> What Rob said... plus, I would urge you to give some more thought to "for 
> audit purposes the data is append only". If your application is ever 
> successful, non-insignificant storage costs are something you'll need to deal 
> with sooner or later.
> 
> Anyway, what you ask is certainly achievable, but not without sustained 
> effort. IMO your options are: spend the time to learn on your own with much 
> reading + trial and error; pay somebody to set it up for you; or, cross the 
> high-availability bridge after you've got something of substance developed, 
> app-wise, on a single local DB.

There’s always Internet search. Mention as many key phrases as you see fit. For 
example:

> Fully managed cloud service for highly available, fault tolerant, 
> Postgres-compatible distributed SQL database

Look at my email address. That outfit is among the hits. So full disclosure is 
done. But, as they say, “other services are available”. So I don’t think that 
my suggested search string is inappropriate for a list like this.