Re: Authentication?

2018-03-07 Thread Stephen Frost
Greetings,

* Bjørn T Johansen (b...@havleik.no) wrote:
> Is it possible to use one authentication method as default, like LDAP, and if 
> the user is not found, then try to authenticate using
> md5/scram-sha-256 ?

Not directly in pg_hba.conf.  You might be able to construct a system
which works like this using PAM though, but it wouldn't be much fun.

LDAP use really should be discouraged as it involves sending the
password to the PG server.  If you are operating in an active directory
environment then you should be using GSSAPI/Kerberos.

SCRAM is a good alternative as it doesn't send the password to the
server either, though that is only available in PG10, of course.

Thanks!

Stephen



Re: Authentication?

2018-03-07 Thread Stephen Frost
Greetings,

* Benedict Holland (benedict.m.holl...@gmail.com) wrote:
> Not to get off topic, can you authenticate database users via Kerberos?

Absolutely.  GSSAPI is the auth method to use for Kerberos.

Thanks!

Stephen



Re: PgBackrest questions

2018-03-14 Thread Stephen Frost
Greetings,

* chiru r (chir...@gmail.com) wrote:
> I am testing Pgbackrest and I have few questions.

Great!

> 1. I used postures user to perform backups and restores with Pgbackrest
> tool.
> The Trust authentication in pg_hba.conf file is working without issues.

Please don't use 'trust'.

> If I use md5 authentication in pg_hba_conf file and postgres user password
> .pgpass file, the pgbackrest backup is failing.

There really shouldn't ever be a need to use md5 authentication with
pgbackrest.  Instead, I'd strongly suggest you use 'peer'.  The 'peer'
method is perfectly safe as it depends on the authentication which
Unix provides, but it doesn't require a password or any of the
associated complications.  Note that 'trust' should *not* be used and
I'm glad to see that you're looking for alternatives to using 'trust'.

> 2.  All the restores through the backrest is going to refer the  Wal
> archive files under archive//* directory, which are taken from
> pgbackrest?

When pgbackrest performs a restore, it will write out the recovery.conf
file for you which includes the restore command to pull the WAL from the
repo and stanza configured.  You shouldn't need to worry about where
those files are, specifically, coming from (and it's even possible that
it might change in the future...).  Is there a specific reason you're
asking?

> 3. What is the compression level by default in pgbackrest?

pgbackrest will use different compression levels depending on what
you're doing.  If your repo is compressed (the default), then zlib level
6 is used.  If you're storing your repo as uncompressed, then pgbackrest
will use level 3 compression for the network transfer.

Note that, in any case, pgbackrest only performs compression once- at
the source.  If the repo is compressed the the data is compressed to
level 6 at the source and then streamed directly out to disk in that
compressed form.  If the repo is uncompressed, then the data is
compressed to level 3 at the source and then decompressed before going
into the repo.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Stephen Frost
Greetings,

* Enrico Thierbach (e...@open-lab.org) wrote:
> I am using `SELECT * FROM queue ... FOR UPDATE SKIP LOCKED` to implement a
> queueing system.
> 
> Now I wonder if it is possible, given the id of one of the locked rows in
> the queue table, to find out which connection/which transaction owns the
> lock.

Sure, you can attempt to lock the record and then run pg_blocking_pids()
(in another session) against the pid which is trying to acquire the
lock.

Session #1:

Connect
SELECT * FROM queue ... FOR UPDATE SKIP LOCKED;
... gets back some id X
... waits

Session #2:

Connect
SELECT pg_backend_pid(); -- save this for the 3rd session
SELECT * FROM queue WHERE id = X FOR UPDATE;
... get blocked waiting for #1
... waits

Session #3:

SELECT pg_blocking_pids(SESSION_2_PID);
-- returns PID of Session #1

Obviously there's race conditions and whatnot (what happens if session
#1 releases the lock?), but that should work to figure out who is
blocking who.

If you're on a version of PG without pg_blocking_pids then you can look
in the pg_locks view, though that's a bit more annoying to decipher.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Stephen Frost
Greetnigs,

* Enrico Thierbach (e...@open-lab.org) wrote:
> I guess with your query I can figure out which connection holds a lock, but
> it seems I cannot correlate those locks to the rows which actually are
> locked, since `pg_locks` seems not to reference this in any way.

What I gave you would work, or you could use the pgrowlocks extension:

https://www.postgresql.org/docs/current/static/pgrowlocks.html

Using pgrowlocks will be slow if you have a lot of records in the table
though, hence the other approach I mentioned since it sounds like you
know the ID that you're interested in.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Stephen Frost
Greetings Melvin,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> >I guess with your query I can figure out which connection holds a lock,
> but it seems I cannot correlate those locks to the rows which actually are
> locked, since pg_locks seems not to reference this in any way.
> 
> *FWIW, I really don't understand your need to identify the actual rows that
> are locked. Once you have identified the query that is causing a block
> (which is usually due to "Idle in Transaction"), AFAIK the only way to
> remedy the problem is to kill the offending query, or wait for it to
> complete. I am not aware of any way available to a user to "unlock"
> individual rows". Indeed, if you could, it would probably lead to
> corruption of some form.*

No, locks are not able to be released mid-transaction.  That said, it
can be difficult sometimes to determine which of the many sessions is
holding a lock on a specific row, hence the two approaches I provided,
which actually address the question which was raised.  While the
use-case might not be on completely solid ground here, I don't think
it's entirely unreasonable, so I don't think there's any need to tell
the OP that what they're asking for isn't really what they want, in this
case.

> *BTW, the query I provided WILL work in version 10.  The commented section
> was for v9.1 and prior, as "someone" felt it necessary to rename some
> fields in pg_stat_activity*
> *and remove/replace another field. Hopefully they will refrain from doing
> so in the future, as it breaks queries and applications.*

Changes will continue to be made between major versions of PostgreSQL
when they're deemed necessary; I'd suggest those applications be
prepared to adjust on a per-major-version basis when future changes
happen.  We do have quite a bit of discussion about changes which are
made and they are not done so without good justification, but they can
and do happen.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PgBackrest questions

2018-03-15 Thread Stephen Frost
Greetings,

* chiru r (chir...@gmail.com) wrote:
> Please respond to my PgBackrest questions,if any one tested.

Please don't spam the lists repeatedly like this.  The responses to this
mailing list are provided by the community on a volunteer basis and
repeated emails are more likely to discourage responses than encourage
them.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: PgBackrest questions

2018-03-15 Thread Stephen Frost
Greetings,

* chiru r (chir...@gmail.com) wrote:
> On Wed, Mar 14, 2018 at 6:17 PM, Stephen Frost  wrote:
> > * chiru r (chir...@gmail.com) wrote:
> > > I am testing Pgbackrest and I have few questions.
> >
> > Great!
> >
> > > 1. I used postures user to perform backups and restores with Pgbackrest
> > > tool.
> > > The Trust authentication in pg_hba.conf file is working without issues.
> >
> > Please don't use 'trust'.
> >
> > > If I use md5 authentication in pg_hba_conf file and postgres user
> > password
> > > .pgpass file, the pgbackrest backup is failing.
> >
> > There really shouldn't ever be a need to use md5 authentication with
> > pgbackrest.  Instead, I'd strongly suggest you use 'peer'.  The 'peer'
> > method is perfectly safe as it depends on the authentication which
> > Unix provides, but it doesn't require a password or any of the
> > associated complications.  Note that 'trust' should *not* be used and
> > I'm glad to see that you're looking for alternatives to using 'trust'.
>
> Is there any alternative method other than Peer and Trust  to use with
> pgbackrest tool?.

pgbackrest uses Perl's DBD::Pg to connect to PostgreSQL, which should
pick up on having a password provided through the environment variable
PGPASSWORD or that of DBI_PASS.  I would suggest setting one of those to
the password instead of trying to use .pgpass (though, given that
DBD::Pg uses libpq underneath, it's not clear to me off-hand why .pgpass
wouldn't work, but it's possible it doesn't because the username is
explicitly provided by pgbackrest to DBD::Pg's connect() routine).

> > > 2.  All the restores through the backrest is going to refer the  Wal
> > > archive files under archive//* directory, which are taken
> > from
> > > pgbackrest?
> >
> > When pgbackrest performs a restore, it will write out the recovery.conf
> > file for you which includes the restore command to pull the WAL from the
> > repo and stanza configured.  You shouldn't need to worry about where
> > those files are, specifically, coming from (and it's even possible that
> > it might change in the future...).  Is there a specific reason you're
> > asking?
>
> I am concerned about WAL archives because, I am planning to transfer my
> backups to Tape as soon as pgbackrest completes backup on disk.
> The restore of backups is going to be on different server.So i am concerned
> about recovery.
> 
> Ex: Always I restore my production backups on DEV environment.
> 
> And also I am planning to *remove *the Full,Diff and incremental backups on
> disk as soon as it transferred to tape. Is there any issue? since I am
> removing backups instead of Pg-backrest tool.

I would strongly recommend against trying to expire out the backups
yourself as you have to take a great deal of caution when it comes to
expiring the WAL- if you expire the wrong WAL then your backups will no
longer be valid.  pgbackrest has expiration options and an expiration
process which runs automatically and which will take care of cleaning
out any WAL which is no longer useful.

We do have plans for improving the options around expiration.

pgbackrest does have an option called '--archive-copy' which will copy
the WAL files required to restore a given backup (and only that WAL)
into the backup directory.  If you use that option and disable
compression then the backup directory for a full backup can be copied
to a new location directly and started as a PG server, where it will
replay all of the WAL.  If you're using differentials or incrementals,
then you would also need to use the --repo-hardlink option (and be on a
filesystem which supports hard-links) to allow you to do the same with
those backups.

In general, this is all a great deal simpler by making the repository
available for doing restores instead of having to ship things between
repos in the way you're describing.  It's also possible to perform a
regular filesystem-level backup and restore of the repository itself and
use that directly instead of trying to extract things from the repo.

> 4. I observed that the *backup.info <http://backup.info>* and
> *backup.info.copy* files under stanza directory. I compared both files I
> did not see any difference.
> What is the reason to keep two files with same contents in same
> directory ?

We don't want a crash or other issue to cause the backup.info file to be
only half-written, or similar, hence the reason for two copies.

> 5. The *backup.manifest *and *backup.manifest.copy* files exist under each
> backup directory(full/diff/incremental). What 

Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-15 Thread Stephen Frost
Greetings Melvin,

* Melvin Davidson (melvin6...@gmail.com) wrote:
> On Thu, Mar 15, 2018 at 10:14 PM, Stephen Frost  wrote:
> > Changes will continue to be made between major versions of PostgreSQL
> > when they're deemed necessary; I'd suggest those applications be
> > prepared to adjust on a per-major-version basis when future changes
> > happen.  We do have quite a bit of discussion about changes which are
> > made and they are not done so without good justification, but they can
> > and do happen.
> 
> Yes, Stephen, I certainly understand making changes to system catalogs
> _when necessary_.
> That being said, the first change was the renaming of pid to procpid in
> pg_stat_activity.
> However, I contend that was more because someone felt that it was more to
> make the column names
> consistent across catalogs, rather than necessity. Care should be taken to
> consider the need and
> effect of changing EXISTING system columns. It may have been a mistake in
> originally naming it,
> but it was certainly working and not causing any problems at the time.

As mentioned earlier, care is taken when considering the need and effect
of changing existing system columns, but it can, and will, happen, and
therefore applications should be designed to cope with such changes
gracefully when they happen and tested thoroughly on each new major
version of PostgreSQL.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-16 Thread Stephen Frost
Greetings,

Please don't top-post.

* Melvin Davidson (melvin6...@gmail.com) wrote:
> this whole discussion started because Enrico did not originally specify the
> PostgreSQL version he was working with. So after he did advise it was for
> 9.6, I felt it necessary to explain to him why a certain section of my
> query was commented out and that it would also work for 10. I have
> previously made it a policy to request ops include the PostgreSQL version
> and O/S when submitting to this list, but I was berated by others for
> always requesting ops to provide that extremely difficult information to
> obtain.

There's a difference between saying that a particular query is intended
for certain versions and to ask for what version while providing useful
information, and just immediately replying to every email which doesn't
specify it asking for what the version and OS is.  The former is being
helpful and specific while soliciting for additional information, while
the latter tends to just create noise on the list, particularly when the
question isn't ultimately relevant.

Note that the above comments are entirely generic- I'm not aware of the
specific emails which you're referring to or why it was suggested that
they weren't helpful.  Specific concerns regarding list usage should
really be addressed to the list moderators and not individuals taking
action on their own.

> I also felt it important that I express my opinion that the changes needed
> were caused by what I felt was cosmetic and unnecessary changes to the
> catalog. There is an old saying "If it ain't broke, don't fix it" and that
> certainly applies here.

There was a great deal of discussion and consideration for the changes
and specific reasons why they were made.

> Now, as to your request for me to read the thread in the url's you
> suggested, I did read most of the content. I note that the entire
> discussion was amongst
> PostgreSQL-development , So at no time
> was the generic population of PostgreSQL users and DBA's involved.

That version of PostgreSQL, as with all of them, went through a beta
period where we specifically ask for feedback from users.  Serious
concerns raised during that time period do result in changes being made
prior to release.

> Therefore, said population, myself included, had no foreknowledge of the
> intended changes which is the cause of the problem. Therefore your
> statement "you might want to consider speaking up in some reasonable time
> frame, not six years later" is abrasive at best, since I, and others, only
> found out about it after the fact. Not to mention, even if I did complain
> earlier, I seriously doubt the changes could or would be reversed.

If you're interested in having a say in what will be included in the
next version of PostgreSQL then I strongly encourage you, and all users,
to work with the beta packages that are put out, every year, for people
to test with.

> At this point I have said all I have to say and will discuss it no further.
> I can only strongly recommend that in the future, proposed changes to
> system catalogs that could adversely affect existing scripts and
> applications be sent to the generic PostgreSQL population (IE:
> pgsql-general@lists.postgresql.org) for comment BEFORE said changes are
> implemented.

Beta releases are announced through the pgsql-announce mailing list,
which is a list that has a great deal less traffic than -general and one
which I'd suggest all users subscribe to, to be aware of changes which
are likely to be in the next release and to test to see if there are any
serious issues, and also to update their applications and queries in
advance of changes being released.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: SELECT .. FOR UPDATE: find out who locked a row

2018-03-16 Thread Stephen Frost
Enrico,

* Enrico Thierbach (e...@open-lab.org) wrote:
> >*FWIW, I really don't understand your need to identify the actual rows
> >that
> >are locked. Once you have identified the query that is causing a block
> >(which is usually due to "Idle in Transaction"), AFAIK the only way to
> >remedy the problem is to kill the offending query, or wait for it to
> >complete. I am not aware of any way available to a user to "unlock"
> >individual rows". Indeed, if you could, it would probably lead to
> >corruption of some form.*
> 
> The goal is to run a job queue, with a potentially largish number of workers
> that feed of the queue. So it would be useful to find out which queue entry
> is being processed right now (I can easily find out: when a row cannot be
> read via SKIP UNLOCKED it is locked, and probably being worked upon.) It
> would also be great to understand which worker holds the lock. The intention
> is NOT to kill the worker or its query.
> 
> With what the conversation brought up here (still trying to catch up with
> everything) I can:
> 
> 1) determine all workers that currently are holding a lock (via Melvin’s);
> 2) on an individual base try to lock the row in a second connection and use
> a third connection to figure out which worker connection holds a lock on a
> specific single row (via Stephen’s).
> 
> This is probably good enough to cover the necessary basic functionality, so
> thank you for your input.
> 
> Am I correct to assume that there is no other way to determine who is
> holding a lock on a specific row and/or determine this for many rows in one
> go?

Evidently my second email got lost somewhere along the way- what you're
looking for is an extension called 'pgrowlocks':

https://www.postgresql.org/docs/10/static/pgrowlocks.html

My prior email on that subject is here:

https://www.postgresql.org/message-id/20180315220512.GV2416%40tamriel.snowman.net

> (I guess I am also correct to assume that whatever the worker is doing there
> is no way to somehow write this information into the database **via the same
> connection**. (Using a second connection would be obviously easy)

You can write it into the database, of course, but you can't read that
information out by some other process.  Depending on what you're doing,
you can use RAISE NOTICE to send messages back to the connected client
from within a stored procedure or similar.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: changing my mail address

2018-03-17 Thread Stephen Frost
Greetings,

* wambac...@posteo.de (wambac...@posteo.de) wrote:
> how can i change my mail adress for the postgresql mailing lists? adding my
> new address worked, but how do i get rid of the old one?

You'll need to change it on postgresql.org:

https://www.postgresql.org/account/

Once you've done that, log out of all PG sites (possibly by deleteing
cookies which you may have from them) and then log into postgresql.org
first and then go to lists.postgresql.org and the update should be
passed through.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: changing my mail address

2018-03-17 Thread Stephen Frost
Greetings,

* Ron Johnson (ron.l.john...@cox.net) wrote:
> On 03/17/2018 10:51 AM, Stephen Frost wrote:
> >Once you've done that, log out of all PG sites (possibly by deleteing
> >cookies which you may have from them) and then log into postgresql.org
> >first and then go to lists.postgresql.org and the update should be
> >passed through.
>
> Sadly, https://www.postgresql.org/account/reset/ hasn't worked for me in 
> yearss.

That wasn't what the OP was asking about, but if you're having an issue
there, then please explain exactly what you're trying to do and what
issue you're seeing.  We're happy to try and resolve any such issues.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: changing my mail address

2018-03-18 Thread Stephen Frost
Greetings Ron,

* Ron Johnson (ron.l.john...@cox.net) wrote:
> On 03/17/2018 01:08 PM, Stephen Frost wrote:
> >* Ron Johnson (ron.l.john...@cox.net) wrote:
> >>On 03/17/2018 10:51 AM, Stephen Frost wrote:
> >>>Once you've done that, log out of all PG sites (possibly by deleteing
> >>>cookies which you may have from them) and then log into postgresql.org
> >>>first and then go to lists.postgresql.org and the update should be
> >>>passed through.
> >>Sadly, https://www.postgresql.org/account/reset/ hasn't worked for me in 
> >>yearss.
> >That wasn't what the OP was asking about, but if you're having an issue
> >there, then please explain exactly what you're trying to do and what
> >issue you're seeing.  We're happy to try and resolve any such issues.
> 
> Having https://www.postgresql.org/account/reset/ send an password email to
> the address where I receive list emails hasn't worked for me in years. 
> (Yes, I looked in the Spam folder, and the Inbox, and the folder where
> -general emails are filtered to.)

If you could contact me off-list with exactly the email address you're
using, we can look into it.

Note that the account reset page will only work if you have an existing
account and only for the (single) email address associated with that
account.  Other email addresses which are added through
https://lists.postgresql.org are not actually able to be used to log
into https://postgresql.org (at least, not yet..  maybe one day).

> When I'd stopped using PG for a while, the simplest solution was to send
> them to the Trash.

These days, each email from the mailing list includes a link which can
be used to unsubscribe from that list without having any account.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Rationale for aversion to the central database?

2018-04-08 Thread Stephen Frost
Greetings,

* Alvaro Aguayo Garcia-Rada (aagu...@opensysperu.com) wrote:
> 1. Portability. Being tied to a single database engine is not always a good 
> idea. When you write business logic in database, you have to write and 
> maintain your store procedures for every database engine you want to support. 
> That can be really complicated, and will surely take pretty much time, as 
> programming languages for different databases are very different from each 
> other. And it's permanent: Every time you make a change to a store procedure, 
> you must make that change for every supported database.

The portability claim tends to be both a false one and often, when
realized, results in a solution where you aren't using the database for
anything complicated and you'd be better off with a much simpler data
store.  You also don't actually offer any justification for the claim
that being tied to a single database engine is not always a good idea-
why is that?  With commercial databases it tends to be because you are
at the behest of some very large commercial company- but that isn't an
issue with PostgreSQL.  The next argument may be that the project might
go away and force a move to another database, but PG's track record is
hard to beat in this area given the history and the number of people
working to make it better and keep it maintained year after year.

Ultimately, you really should be thinking of the database as the
language of your data.  You wouldn't write your app in multiple
different programming languages, would you?  What if Python suddently
goes away as a language, or Java does?  Would you write every piece of
software twice, so that you can flip over to using Python instead of
Java on a whim?

> 2. Performance. I still don't have it clear, but, presumably, plpgsql(among 
> others) may not have the same performance as Java, Ruby, Python, C++, or any 
> other programming language. Also, when your application runs outside of the 
> database server, having business logic on database will make your app use 
> more resources from the database server, thus rendering is slow for all other 
> operations. This goes against separating database and application.

No, plpgsql doesn't have the same performance characteristics as Java,
Ruby, Python, C++, or many other languages- but that's why it isn't the
only language which is supported in PostgreSQL.  You can write back-end
functions in another of those languages, plus quite a few others
including Javascript, R, Perl.  As for the question about if it'll
actually result in the database server being more taxed or not- that
really depends.  Aggregates are a great example- is it going to be
cheaper for the database to run 'sum()' across a data set and give you
the result, or for it to take every row from that table and ship it to
a client?  There are certainly examples which can go the other way too,
of course, but it's really something to think about on an individual
basis, not to make a broad stroke decision about, particularly when
you're arguing that you'll get better performance by moving the code
away from the data, that tends to not be the case.

> However, there are some cases when you may want or need to use business logic 
> on database: when you need to fetch large volumes of data to produce some 
> report. This is the case of some accounting reports in complex ERPs. The only 
> advantage store procedures have is they run INSIDE the database, so there's 
> no TCP/IP overhead and no network latency when the store procedure make a 
> large query. Even running in the same host, fetching large volumes of data 
> will always be faster from a store procedure.

This is what I'm getting at above, but I would caution that looping over
a table in a stored procedure is generally much less performant than
finding a way to express what you want in SQL.

I'm afraid that the other advantages of doing more in the database
aren't really being considered in your arguments above either- things
like having constraints all checked in one place, regardless of the
application, and the ability to have interfaces defined which multiple
applications could operate against and know that they're all going to be
getting back the same, consistent, results from the database since it's
the same code underneath.  Some of that can be done by sharing code
between the applications, of course, but there's a great deal more risk
there (what about when the applications need to change something that's
done in that shared code, so they essentially fork it..?  Or the
applications have to be upgraded at different times, or a variety of
other situations which could lead to that common code diverging, or even
when the applications aren't written in the same language...).

Just some food for thought.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Rationale for aversion to the central database?

2018-04-08 Thread Stephen Frost
Greetings,

* Ravi Krishna (sravikrish...@gmail.com) wrote:
> >> I am however very comfortable with using psql and PL/pgSQL and I am very
> >opinionated.
> >
> Nothing wrong with this approach and it may very well work 90% of the time.
> Until ... a day comes when
> you need to migrate out of PG to another RDBMS.  Good luck at that time.

Oh, don't worry, people do that.  What I particularly enjoy are the
stories (of which there are a number now...) where people moved away
from PG for some reason or another, then to another database, to
another, and another, and finally back to PG again, much the wiser for
it but also rather battle-worn. :)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: algo for canceling a deadlocked transaction

2018-04-09 Thread Stephen Frost
Greetings,

* Thomas Poty (thomas.p...@gmail.com) wrote:
> My question is : In case of a deadlock between 2 transaction,  how to know
> which transaction will be canceled? Is it predictable?

The short answer is "it's whichever one detected the deadlock."  The
deadlock timeout fires after a lock has been held that long and if a
deadlock is detected then the process detecting it will be canceled.

I'd strongly recommend reviewing your application and addressing
deadlocks by changing how the application acquires locks to be
consistent and to avoid lock escalation instead of worrying about how to
predict a deadlock- a properly designed and written application
shouldn't be causing deadlocks to happen in the first place.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Postgresql database encryption

2018-04-20 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 04/20/2018 03:55 PM, Vick Khera wrote:
> >On Fri, Apr 20, 2018 at 11:24 AM, Vikas Sharma  >For anyone to offer a proper solution, you need to say what purpose your
> >encryption will serve. Does the data need to be encrypted at rest? Does it
> >need to be encrypted in memory? Does it need to be encrypted at the
> >database level or at the application level? Do you need to be able to
> >query the data? There are all sorts of scenarios and use cases, and you
> >need to be more specific.
> >
> >For me, using whole-disk encryption solved my need, which was to ensure
> >that the data on disk cannot be read once removed from the server.
> 
> Someone really needs to explain that to me. My company-issued laptop has
> WDE, and that's great for when the machine is shut down and I'm carrying it
> from place to place, but when it's running, all the data is transparently
> decrypted for every process that wants to read the data, including malware,
> industrial spies,
> 
> Thus, unless you move your DB server on a regular basis, I can't see the
> usefulness of WDE on a static machine.

The typical concern (aka, attack vector) isn't around moving the DB
server on a regular basis or about someone breaking into your data
center and stealing your drives, it's making sure that disposal of
equipment doesn't result in valuable data being retained on the
drives when they leave the data center for replacement or disposal.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Postgresql database encryption

2018-04-20 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 04/20/2018 06:11 PM, Stephen Frost wrote:
> >* Ron (ronljohnso...@gmail.com) wrote:
> >>On 04/20/2018 03:55 PM, Vick Khera wrote:
> >>>On Fri, Apr 20, 2018 at 11:24 AM, Vikas Sharma  >>>For anyone to offer a proper solution, you need to say what purpose your
> >>>encryption will serve. Does the data need to be encrypted at rest? Does it
> >>>need to be encrypted in memory? Does it need to be encrypted at the
> >>>database level or at the application level? Do you need to be able to
> >>>query the data? There are all sorts of scenarios and use cases, and you
> >>>need to be more specific.
> >>>
> >>>For me, using whole-disk encryption solved my need, which was to ensure
> >>>that the data on disk cannot be read once removed from the server.
> >>Someone really needs to explain that to me. My company-issued laptop has
> >>WDE, and that's great for when the machine is shut down and I'm carrying it
> >>from place to place, but when it's running, all the data is transparently
> >>decrypted for every process that wants to read the data, including malware,
> >>industrial spies,
> >>
> >>Thus, unless you move your DB server on a regular basis, I can't see the
> >>usefulness of WDE on a static machine.
> >The typical concern (aka, attack vector) isn't around moving the DB
> >server on a regular basis or about someone breaking into your data
> >center and stealing your drives, it's making sure that disposal of
> >equipment doesn't result in valuable data being retained on the
> >drives when they leave the data center for replacement or disposal.
> 
> That makes some sense, but years of added CPU overhead to mitigate a problem
> that could be solved by writing zeros to the disk as a step in the decomm
> process seems more than a bit wasteful.

This presumes that the drive is still functional enough to be able to
overwrite it with zeros, and that overwriting it with zeros would be
sufficient.  Neither are, necessairly, accurate.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Enhancement to psql command, feedback.

2018-05-09 Thread Stephen Frost
Greetings,

* John McKown (john.archie.mck...@gmail.com) wrote:
> Again, this is just a discussion point. And I'm quite willing to admit
> defeat if most people don't think that it is worth the effort.

For my 2c, at least, I do think it'd be kind of neat to have, but we'd
need a fool-proof way to realize that's how we're being called and,
ideally, that would be something we could detect without having to have
special flags for psql which anyone writing such a script would have to
be aware of.

Do you know if there's a way to detect that we're being called this
way..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Will Altering and Modifying tables during backup result in a corrupted server after the restore?

2018-05-21 Thread Stephen Frost
Greetings,

* Yashwanth Govinda Setty (ygovindase...@commvault.com) wrote:
> 1.   Creating a big table. Identify the physical file on the disk.
> 1.  While backup process is backing up a file associated with the table - 
> update the rows , add a column.
> 2.  Restore the server with transaction logs
> 
> We are backing up (copying) the entire postgres data directory. The 
> database/table file being backed up (copied), is modified by running 
> alter/update queries on the table.
> When we restore the copied data directory and replay/apply the transaction 
> logs, will the server be restored to an healthy state?

You haven't really spelled out your actual backup process but it
certainly sounds like it's lacking.

> (The files modified during backup can be corrupted, will this affect the 
> restore?)

With a proper PG-style filesystem-backup, any corruption due to ongoing
writes from PG will be handled by the transaction log.

To perform a proper PG-style filesystem-backup, you need to:

- Ensure that WAL is being archived somewhere.  This can be done with
  archive_command or with pg_receivewal.  All WAL archived during the
  backup *must* be saved or the backup will be inconsistent and
  incomplete (and, therefore, basically useless).

- Make sure to run pg_start_backup() before you begin copying *any*
  files

- Make sure to run pg_stop_backup() after you have copied all files

- Verify that all of the WAL generated between the pg_start_backup()
  call and the pg_stop_backup() call have been archived.  The
  information about what WAL is needed is returned from those calls.

- On restore, create and populate the backup_label file in the data
  directory to indicate that it was a backup being restored.
  (Alternatively, create that file during the backup itself and store it
  in the backup system, to be restored when the backup is restored).

Ideally, you'd also verify the page-level checksums (if they're enabled)
when doing the backup, calculate your own checksum of the file (to
detect if it gets corrupted between the backup time and the restore
time) and verify that everything is physically written out to permanent
storage before claiming to have a successful backup.

Further information is available here:

https://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-BASE-BACKUP

Generally speaking, however, I would strongly discourage people from
trying to write yet-another-PG-backup-tool, there are several already,
my favorite being pgbackrest, and contributing to one of them would be
a better approach.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Will Altering and Modifying tables during backup result in a corrupted server after the restore?

2018-05-21 Thread Stephen Frost
Greetings,

* Christoph Moench-Tegeder (c...@burggraben.net) wrote:
> ## Yashwanth Govinda Setty (ygovindase...@commvault.com):
> 
> >   2.  Restore the server with transaction logs
> 
> This is missing a lot of details. If you do it right - see your email
> thread from one week ago - you will be able to recover the database
> server to a state as of the _end_ of the backup process (as marked by
> the return of the pg_stop_backup() command).
> If you do not follow the backup/restore documentation to the letter,
> the database will be corrupted and will not start (sometimes people
> report with haphazard backup schemes, but that's just more luck than
> they deserve, and nobody should rely on that).

Please also note that the PG documentation provided, when it comes to
the examples, are purely for usage demonstration only and shouldn't be
considered a good idea when it comes to implementing an actual solution.

Using only "cp" for archive_command is a particularly bad idea as it
doesn't sync the file to disk.  Be sure to also heed the recommendation
about using the non-exclusive backup method and *not* using the
exclusive backup method.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Syndicating PostgreSQL mailing list to Discourse

2018-05-27 Thread Stephen Frost
Greetings,

* Erlend Sogge Heggen (e.so...@gmail.com) wrote:
> Since it's read-only it would only be used for things like:
> 
>- Fast search with advanced filters (Discourse puts PostgreSQL full text
>search to good use!)

While it might not be the case for other projects, we actually do use
PostgreSQL for our archives, including having FTS..

>- All active lists aggregated into one feed, but also available as
>individual categories that can be tracked/watched.

I'm curious what would be different here from what our archives provide.
We could certainly have a single "all lists" archive page but that seems
more likely to be just completely confusing than actually useful at all.

>- Single-page topics, easier to read on mobile for the young'uns.

We've been working to make the mobile experience better for our
archives and we'd love to make the experience better for everyone, so
please make suggestions or even send in patches; all of the code running
the archives is open.

> If our mirror archive gets enough traction we'd like to make it possible to
> sign up to the forum and seamlessly interact with the mailing list,
> provided it's something the community actually wants of course. We're doing
> a similar experiment with the ruby-talk mailing list, which you can see
> being tested  at https://rubytalk.org/.

How do you plan to address the issues around DMARC/SPF/DKIM..?  If the
answer is "we don't plan to do anything" or "we are going to send email
from our own domain" then you're certainly not making it "seamless" for
us or for the user.  If you have a different solution, then I think
we're certainly curious to hear it, as it would be nice to change the
-bugs and -docs forms back to using the end user's email address when
sending to the list instead of having to have a 'noreply' address be
used. 

Please do *not* start causing us trouble by sending what looks like
forged email through our mailing lists and causing bounces for us to
deal with.  We have more than enough of that already and if it becomes
an issue then we'll have to block, bounce, and/or unsubscribe whatever
is causing it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: LDAP authentication slow

2018-05-30 Thread Stephen Frost
Greetings,

* C GG (cgg0...@gmail.com) wrote:
> This is PostgreSQL 9.5 -- We just enabled LDAP(S) authentication (to an
> Active Directory server) for a certain grouping of users

You really shouldn't be using LDAP auth to an Active Directory system.
Active Directory supports Kerberos, which is a much more secure way to
authenticate.

> I'm getting complaints from the users authenticating using ldap that
> database operations are taking quite a bit longer than they were previously
> when they were authenticating with MD5 stored passwords. Clearly, there's
> more machinery at work with that kind of operation, but the increase in
> time is way more that I would have estimated.

Using Kerberos/GSSAPI will be much faster as you only need to get a
ticket and then that can be reused to authenticate multiple times to the
database without having to contact the LDAP server.

> I still want them to be able to type in a password, so GSSAPI is out for an
> alternative (right?) ... Is there something I can do to help speed things
> up? If there any telemetry that I can generate (logs, stats, etc.) which
> might be able to pinpoint a bottleneck?

What's the reason for wishing for them to "be able to type in a
password"?  With GSSAPI/Kerberos, users get true single-sign-on, so they
would log into the Windows system with a password and then have a TGT
which can be used to authenticate to other services without having to
type in their password over and over again.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: LDAP authentication slow

2018-05-30 Thread Stephen Frost
Greetings,

* C GG (cgg0...@gmail.com) wrote:
> On Wed, May 30, 2018 at 12:04 PM, Stephen Frost  wrote:
> > What's the reason for wishing for them to "be able to type in a
> > password"?  With GSSAPI/Kerberos, users get true single-sign-on, so they
> > would log into the Windows system with a password and then have a TGT
> > which can be used to authenticate to other services without having to
> > type in their password over and over again.
>
> By and large, we're using pre-existing tools that would have to be heavily
> modified to co-opt GSSAPI as an authentication method. For some
> tools/applications, it's just not practical to use a ticket. But the
> username/password paradigm is well supported. Most of these tools aren't
> being used on Windows machines. That's not to say that Linux and macOS
> don't have robust Kerberos tools available for use, but thinking that
> Kerberos tickets are just floating out there in login-space waiting to be
> used by psql and other tools isn't really the case in our environment.

This argument doesn't really hold much weight.  Anything using libpq is
likely to work well with GSSAPI and most languages base their access to
PG on libpq.  Ensuring that a ticket is available also isn't hard with
k5start or similar.  Even proxying tickets with mod_auth_kerb or similar
isn't all that difficult to get going to leverage SPNEGO.

> The main reason for moving to LDAP(S) is the ability to enforce password
> quality and rotation without the risk of disclosure (
> https://www.postgresql.org/docs/10/static/passwordcheck.html) ... Allowing
> pre-hashed passwords to be sent to the back-end circumvents any protections
> passwordcheck might give. Plus, passwordcheck isn't available in all
> PostgreSQL environments (I'm specifically thinking of AWS RDS).

This seems entirely out-of-place and not related to GSSAPI (pre-hashed
passwords..?).  Further, password quality and rotation would be able to
be handled by AD instead of trying to do it in PG, though this also
seems to be conflating different things (are you talking about access
from an application, whose password should be randomly generated to
begin with, or users..?).

> Unless I've missed something GSSAPI is still out of consideration if we're
> having to supply username/password combinations in connection strings.

There continues to be some confusion here as with GSSAPI you
specifically wouldn't need to include passwords in connection strings.

> I am still wondering what can we do to speed LDAP(S) up? Is there a
> speedier authentication delegation paradigm that utilizes username/password
> from the client?

Passing passwords around between different systems for authentication is
likely to be expensive and insecure.  Using SCRAM on PG would at least
avoid the call out from the PG server to the LDAP server but then you
would have different passwords on the different systems potentially.

The solution to these issues is to move away from passing passwords
around, as Active Directory did.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-06-05 Thread Stephen Frost
Greetings,

* Magnus Hagander (mag...@hagander.net) wrote:
> On Tue, Jun 5, 2018 at 4:45 PM, Chris Travers 
> wrote:
> > If I may suggest:  The committee should be international as well and
> > include people from around the world.  The last thing we want is for it to
> > be dominated by people from one particular cultural viewpoint.
> 
> It will be. This is the PostgreSQL *global* development group and project,
> after all. Yes, there is definitely a slant in the project in general
> towards the US side, as is true in many other such projects, but in general
> we have decent coverage of other cultures and countries as well. We can't
> cover them all  on the committee (that would make for a gicantic
> committee), but we can cover it with people who are used to communicating
> and working with people from other areas as well, which makes for a better
> understanding.
> 
> It won't be perfect in the first attempt, of course, but that one is
> covered.

This drives to a point which I was thinking about also- what is needed
on the committee are people who are worldly to the point of
understanding that there are different cultures and viewpoints, and
knowing when and how to ask during an investigation to get an
understanding of if the issue is one of cultural differences (leading
potentially to education and not to reprimand, as discussed in the CoC),
something else, or perhaps both.

The CoC committee doesn't need to be comprimised of individuals from
every culture to which the community extends, as that quickly becomes
untenable.

I'm confident that the Core team will work to ensure that the initial
committee is comprised of such individuals and that both Core and the
subsequent CoC committees will work to maintain that.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-06-05 Thread Stephen Frost
Greetings,

* Benjamin Scherrey (scher...@proteus-tech.com) wrote:
> On Wed, Jun 6, 2018 at 2:12 AM, Christophe Pettus  wrote:
> > Not at all.  The need for a CoC is not theoretical.  Real people,
> > recently, have left the community due to harassment, and there was no
> > system within the community to report and deal with that harassment.
> 
> I keep hearing this claim. I've followed up and tried to verify them. Sorry
> but "trust me" doesn't cut it here any more than "trust me this will make
> Postgres go faster" would on a code change. What's the context for this?
> What evidence do we have that indicates this CoC would have likely resulted
> in a different outcome? Without that then your claim does not even rise up
> to the standard of theoretical. Frankly this claim does not seem very
> plausible to me at all. Let's try to keep our standards here. I'm not
> trying to harp on you personally, it's just that you're the unlucky
> umpteenth time I've seen this claim made with zero satisfaction.

While I can't say for sure, I feel reasonably confident that the level
of proof you're asking for here isn't going to be forthcoming as it's a
matter that Core has decided is best kept private, not unlike what we
would expect the CoC Committee to do in instances where appropriate,
possibly at the request and/or agreement of the individual or
individuals involved.

So while I can understand why you're asking, it's not particularly
useful to continue to do so.  Specific suggestions about how to change
the proposed CoC would be useful, but the ongoing discussion about if
one is needed is not.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Load data from a csv file without using COPY

2018-06-19 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 06/19/2018 03:18 PM, Asif Ali wrote:
> >just tell me the site , i dont have time to waste on shitty things , i
> >will program a spammer to send email to this list
> 
> So why subscribe in the first place?

Thanks for the attempts at helping folks, but it's been addressed.
Please don't reply further on this sub-thread.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: User documentation vs Official Docs

2018-07-16 Thread Stephen Frost
Greetings,

* Benjamin Scherrey (scher...@proteus-tech.com) wrote:
> One thing I recall very fondly about the early days of the Lamp stack was
> that the official documentation of PHP and MySQL was augmented with user
> created practical examples. It was still reference documentation organized
> by command or function, but in a comment-like section underneath the formal
> docs were user provided short practical examples of how the command would
> be used in real situations. One was able to teach themselves how to build a
> dynamic website front ending a database just by exploring the core docs and
> reading the examples.

We have a place for this to go, in the official docs, already split out
by version, and it starts here:

https://www.postgresql.org/docs/10/static/tutorial-start.html

Adding more to that certainly sounds good to me.

So, for my 2c, at least, "patches welcome."

Drive-by comments saying that we need a place for this, when we already
have one, and saying that the community should develop it, while not
acknowledging or contributing to what we already have, does not strike
me as particularly useful.

We tried having a comment area on the docs and those ultimately ended up
being... less than ideal.  I'm not anxious to repeat that experiment.
I'm glad it worked for other communities, but it didn't work for us and
we have a good bit of history to show that.

The best way to improve that section of the docs is to write up good
user example-based documentation and submit it as patches.  I'd
certainly be happy to see that and to try and help by moving such
patches forward to commit.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: User documentation vs Official Docs

2018-07-18 Thread Stephen Frost
Greetings Vick,

* Vick Khera (vi...@khera.org) wrote:
> I didn't know it existed either, mostly because I know how to ask google to
> do things, and the things I need to know are not covered here (yet). This
> does seem to me to be the ideal place to add more how to documentation to
> augment all the reference docs we have.

Agreed.  It'd be great to have more tutorials in our official
documentation.

> As for some "strong SEO" I think already the top hit for almost everything
> I seek postgres related is the official manual, so it seems to have good
> SEO. The only big improvement would be somehow to tell google to only show
> me the newest version of the manual, not all of the older ones too, for the
> same page.

Agreed, and there's ongoing work to improve the situation regarding the
different versions of the manual and getting Google to show the
"current" URL in preference to the other versions.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: User documentation vs Official Docs

2018-07-20 Thread Stephen Frost
Greetings,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> I don't see why we need this thread to continue.  This sounds like
> somebody looking for a solution when they don't yet know what the
> problem is.
> 
> If people want to contribute, there are already some places where they
> can do so.  Articles can be drafted in the wiki initially or, heck, even
> sites like StackOverflow[1], and if something gets to a level so great
> that they think it should be enshrined in DocBook, they can turn it into
> a documentation patch.

+1.  I'd personally like to see improvements to the tutorials, and
patches could certainly be submitted or specific ideas discussed over on
-docs.

A few ideas around that would be:

- Setting up async replication
- Setting up sync replication, with quorum-based sync
- Cascading replication
- Parallel pg_dump-based backup/restore (with pg_dumpall for globals)
- Using various important extensions (pg_stat_statements,
  pg_buffercache, pageinspect, pg_freespacemap, pg_visibility)
- Using pg_basebackup to build replicas
- Using pg_receivewal to have a WAL archive

Of course, there's a lot of additional tutorials that would be nice to
have which go beyond what's in core and leverage tools like pgbouncer,
pgbackrest, patroni, etc, but they'd go on the wiki or elsewhere since
they would be necessairly referring to bits that are outside of PG core.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Improving pg_dump performance

2018-07-23 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> We've got an old (v8.4.17, thus no parallel backups) 2.9TB database that
> needs to be migrated to a new data center and then restored to v9.6.9.

You should be using 9.6's pg_dump to perform the export.  Might be a bit
annoying to do, but you should be able to install it on to a nearby
server or the same server as 8.4 is running on but in another location.

With 9.6's pg_dump, you could use parallel mode, but you have to prevent
anything from changing the data between when the first connection from
pg_dump is made until all of the connections have completed and started
their transactions (should be just a few seconds, really).  Of course,
that export won't include any changes after the pg_dump starts, so
you'll need a way to manage those.

> The database has many large tables full of bytea columns containing pdf
> images, and so the dump file is going to be more than 2x larger than the
> existing data/base...

The dump file isn't going to include any content from indexes and, at
least looking at some PDFs locally, they can certainly be compressed
effectively sometimes, and they might be getting compressed today in
your 8.4 instance thanks to TOAST, and more to the point, the textual
representation of a bytea which will end up in the export would almost
certainly be compressable too.

> The command is:
> $ pg_dump -v -Z0 -Fc $DB --file=${TARGET}/${DATE}_${DB}.dump 2>
> ${DATE}_${DB}.log
> 
> Using -Z0 because pdf files are already compressed.

They aren't really PDF files in the export though- they're bytea's
represented in a textual format.  Have you tested the difference between
using -Z0 and allowing compression to happen?

> Because of an intricate web of FK constraints and partitioned tables, the
> customer doesn't trust a set of "partitioned" backups using --table= and
> regular expressions (the names of those big tables all have the year in
> them), and so am stuck with a single-threaded backup.

All of the FKs will be re-checked when the data is imported into the new
instance.

> Are there any config file elements that I can tweak (extra points for not
> having to restart postgres) to make it run faster, or deeper knowledge of
> how pg_restore works so that I could convince them to let me do the
> partitioned backups?

pg_restore isn't doing much more than restoring what's in the backup
into the database using COPY commands.  Since it's an export/import, all
the FKs and other constraints will be re-checked and all indexes will be
rebuilt during the import.

> Lastly, is there any way to not make the backups so large (maybe by using
> the --binary-upgrade option, even though the man page says, "in-place
> upgrades only")?

You could possibly upgrade the existing system from 8.4 to 9.6 in-place
(which would require a bit of downtime but typically on the order of
minutes instead of many hours) and then take a filesystem-level backup
using a tool like pgBackRest and then restore that at the new data as a
replica and use streaming replication until you're ready to cut over to
the new data center.  That's probably how I'd tackle this anyway, though
one nice thing about the dump/restore is that you could have checksums
enabled on the new cluster.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Improving pg_dump performance

2018-07-23 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> An interesting idea.  To clarify: it's possible to parallel backup a running
> 8.4 cluster remotely from a 9.6 system?

Yes, you can do a parallel backup, but you won't be able to get a
consistent snapshot.  You'll need to pause all changes to the database
while the pg_dump processes connect and start their transactions to
have the backup be consistent.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-11 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Dear Experts,

Since you're asking ...

> I recently set up replication for the first time.  It seemed to be 
> working OK in my initial tests, but didn't cope when the slave was 
> down for a longer period.  This is all with the Debian stable 
> packages of PostgreSQL 9.6.  My replication setup involves a third 
> server, "backup", to and from which WAL files are copied using scp 
> (and essentially never deleted), plus streaming replication in 
> "hot standby" mode.
> 
> On the master, I have:
> 
> wal_level = replica
> archive_mode = on
> archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
>scp %p backup:backup/postgresql/archivedir/%f'

This is really not a sufficient or particularly intelligent
archive_command.  In general, I'd strongly recommend against trying to
roll your own archiving/backup solution for PostgreSQL.  In particular,
the above is quite expensive, doesn't address the case where a file is
only partially copied to the backup server, and doesn't ensure that the
file is completely written and fsync'd on the backup server meaning that
a failure will likely result in WAL going missing.  There's much better
solutions out there and which you really should be using instead of
trying to build your own.  In particular, my preference is pgbackrest
(though I'm one of the folks in its development, to be clear), but
there's other options such as barman or WAL-E which I believe also
address the concerns raised above.

> On the slave I have:
> 
> standby_mode = 'on'
> primary_conninfo = 'user=postgres host=master port=5432'
> restore_command = 'scp backup:backup/postgresql/archivedir/%f %p'
> 
> hot_standby = on
> 
> 
> This all seemed to work OK until the slave was shut down for a few days.  
> (During this time the master was also down for a shorter period.) 
> When the slave came back up it successfully copied several WAL files 
> from the backup server until it reached one that hasn't been created yet:

That all sounds fine.

> 2018-08-10 22:37:38.322 UTC [615] LOG:  restored log file 
> "0001000700CC" from archive
> 2018-08-10 22:38:02.011 UTC [615] LOG:  restored log file 
> "0001000700CD" from archive
> 2018-08-10 22:38:29.802 UTC [615] LOG:  restored log file 
> "0001000700CE" from archive
> 2018-08-10 22:38:55.973 UTC [615] LOG:  restored log file 
> "0001000700CF" from archive
> 2018-08-10 22:39:24.964 UTC [615] LOG:  restored log file 
> "0001000700D0" from archive
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory
> 
> At this point there was a temporary problem due to a firewall rule that 
> blocked the replication connection to the master:
> 
> 2018-08-10 22:41:35.865 UTC [1289] FATAL:  could not connect to the primary 
> server: SSL connection has been closed unexpectedly

Well, that's certainly not good.

> The slave then entered a loop, retrying to scp the next file from the backup 
> archive (and failing) and connecting to the master for streaming replication 
> (and also failing).  That is as expected, except for the "invalid magic 
> number":
> 
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory
> 2018-08-10 22:41:37.018 UTC [615] LOG:  invalid magic number  in log 
> segment 0001000700D1, offset 0
> 2018-08-10 22:43:46.936 UTC [1445] FATAL:  could not connect to the primary 
> server: SSL connection has been closed unexpectedly

Note that the invalid magic number error is just LOG-level, and that
PostgreSQL will certainly ask for WAL files which don't exist in the
archive yet.

> Note that it's complaining about an invalid magic number in log segment D1, 
> which is 
> the one that has just failed to scp.  That looked suspicious to me and I 
> remembered 
> notes in the docs about ensuring that the archive and/or restore commands 
> return an 
> error exit status when they fail; I checked that and yes scp does exit(1) 
> when the 
> requested file doesn't exist:

Sure, but scp doesn't produce any output when it's successful, so are
you sure that an scp wasn't performed after the "no such file or
directory" log message and which perhaps performed a partial copy
(see above about *exactly* that concern being mentioned by me, even
before getting down to this part, due to using these simple unix
commands...)?

> $ scp backup:/egheriugherg /tmp/a
> scp: /egheriugherg: No such file or directory
> $ echo $?
> 1

scp has no idea if it made a copy of a partial file though.

> Anyway, the slave continued in its retrying loop as expected, except that 
> each time 
> it re-fetched the previous segment, D0; this seems a bit peculiar:
> 
> 2018-08-10 22:44:17.796 UTC [615] LOG:  restored log file 
> "0001000700D0" from archive
> scp: backup/postgresql/archivedir/0001000700D1: No such file or 
> directory

Re: Replication failure, slave requesting old segments

2018-08-12 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>archive_command = 'ssh backup test ! -f backup/postgresql/archivedir/%f &&
> >>   scp %p backup:backup/postgresql/archivedir/%f'
> >
> >This is really not a sufficient or particularly intelligent
> >archive_command.  In general, I'd strongly recommend against trying to
> >roll your own archiving/backup solution for PostgreSQL.  In particular,
> >the above is quite expensive, doesn't address the case where a file is
> >only partially copied to the backup server, and doesn't ensure that the
> >file is completely written and fsync'd on the backup server meaning that
> >a failure will likely result in WAL going missing.  There's much better
> >solutions out there and which you really should be using instead of
> >trying to build your own.  In particular, my preference is pgbackrest
> >(though I'm one of the folks in its development, to be clear), but
> >there's other options such as barman or WAL-E which I believe also
> >address the concerns raised above.
> 
> Hmmm, well I based it on what I read in the documentation:
> 
>   https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
> 
> The expense is not an issue in this case.  I would be surprised if
> scp were to exit(0) after only partially copying a file but perhaps
> you know something I don't about that.

The scp might not, but so what?  If a partial file remains on the far
side, then a subsequent invokation of that archive_command will come
back saying that the file is there.  Further, the file isn't going to
actually be sync'd to the filesystem on the far side, so even if the scp
returns success, a failure on the backup server could result in loss of
the WAL file.

> I know that rsync creates a temporary file and moves it into place when
> all the data has been transferred, which scp doesn't do; perhaps that
> would be safer.

No, what would be better is using a tool which will actually fsync() the
file on the backup server after writing it out there, making sure it's
been stored to the filesystem, and then to perform an atomic mv
operation to put it in place, and then to fsync() the directory to
ensure that change has also been sync'd to disk.

Even better is to calculate a hash of the file and to reverify that on
restore, to make sure that it doesn't end up getting corrupted on the
backup server somehow.

> The master was only writing WAL files maybe 3 or 4 times per day at
> the time, so any scenario that requires its crash to have occurred
> exactly as it was doing an scp involves a substantial coincidence.

While I get that it's unlikely to be relevant to this particular case, I
wouldn't recommend depending on never having that coincidence happen.
What I've seen quite a bit of, just so you know, is that things tend to
all break down at about the same time- so right when something bad is
happening on the primary, the backup server runs into issues too.  This
might be due to networking breaking down in a particular area all at
once or shared storage having some issue but it's happened often enough
that relying on uncommon things to avoid each other is something I've
got a hard time doing these days.

> I am limited in what I can run on the backup server.

You could consider backing up locally using a tool which is built for
PostgreSQL and then sync'ing the results of that backup over to the
backup server, perhaps.  Or point out that you need a backup server
where you don't have those restirctions in order to properly have
backups.  Or use an external service like s3 (or a clone) to back up to.

> >Having it go back to D0 is at least a bit interesting.  I wonder if
> >somehow it was partial..  Have you looked at the files on the archive
> >server to make sure they're all 16MB and the same size?
> 
> Yes, all the files on the backup including D0 were 16 MB.  But:

Ok.  Did you see any failures on the sending side when archive_command
was being run?  PG shouldn't start trying to archive the file until it's
done with it, of course, so unless something failed right at the end of
the scp and left the file at 16MB but it wasn't actually filled..

> >>2018-08-11 00:12:15.536 UTC [7954] LOG:  restored log file 
> >>"0001000700D0" from archive
> >>2018-08-11 00:12:15.797 UTC [7954] LOG:  redo starts at 7/D0F956C0
> >>2018-08-11 00:12:16.068 UTC [7954] LOG:  consistent recovery state reached 
> >>at 7/D0FFF088
> 
> Are the last two log lines above telling us anything useful?  Is that
&

Re: Replication failure, slave requesting old segments

2018-08-12 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> OK.  I think this is perhaps a documentation bug, maybe a missing 
> warning when the master reads its configuration, and maybe (as you say) 
> a bad default value.

If we consider it to be an issue worthy of a change then we should
probably just change the default value, and maybe not even allow it to
be set lower than '1'.

> Specifically, section 26.2.5 of the docs says:
> 
> "If you use streaming replication without file-based continuous archiving, 
> the server might recycle old WAL segments before the standby has received 
> them. If this occurs, the standby will need to be reinitialized from a new 
> base backup. You can avoid this by setting wal_keep_segments to a value 
> large enough to ensure that WAL segments are not recycled too early, or by 
> configuring a replication slot for the standby. If you set up a WAL archive 
> that's accessible from the standby, these solutions are not required, since 
> the standby can always use the archive to catch up provided it retains enough 
> segments."
> 
> OR, maybe the WAL reader that process the files that restore_command fetches 
> could be smart enough to realise that it can skip over the gap at the end?

That strikes me as a whole lot more complication in something we'd
rather not introduce additional complications into without very good
reason.  Then again, there was just a nearby discussion about how it'd
be nice if the backend could realize when a WAL file is complete, and I
do think that'll be more of an issue when users start configuring larger
WAL files, so perhaps we should figure out a way to handle this..

> Anyway.  Do others agree that my issue was the result of wal_keep_segments=0 ?

Yeah, I've not spent much time actually looking at code around this, so
it'd be nice to get:

a) a reproducible case demonstrating it's happening
b) testing to see how long it's been this way
c) if setting wal_keep_segments=1 fixes it
d) perhaps some thought around if we could address this some other way

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 08/12/2018 03:54 PM, Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>OK.  I think this is perhaps a documentation bug, maybe a missing
> >>warning when the master reads its configuration, and maybe (as you say)
> >>a bad default value.
> >
> >If we consider it to be an issue worthy of a change then we should
> >probably just change the default value, and maybe not even allow it to
> >be set lower than '1'.
> 
> I would say leave the default at 0 as it leaves no doubt that you are
> performing without a net. A setting of '1' implies you are covered and for a
> fast moving cluster or slow moving one with sufficient downtime that would
> not be the case. Better to let the end user know this is not a simple
> problem and some thought needs to go into configuration.

Uh, this specific case is where there *is* a 'safety net' though-
archive command and restore command were configured and being used, so I
don't buy off on this argument at all.

Maybe we just internally bump wal_keep_segments to '1' to avoid this
specific risk without actually changing the default or making people
change their existing configurations, but if this is really what's
happening then I don't think the answer is "don't do anything."

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Adrian Klaver wrote:
> >On 08/12/2018 03:54 PM, Stephen Frost wrote:
> >>Greetings,
> >>
> >>* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>>OK.  I think this is perhaps a documentation bug, maybe a missing
> >>>warning when the master reads its configuration, and maybe (as you say)
> >>>a bad default value.
> >>
> >>If we consider it to be an issue worthy of a change then we should
> >>probably just change the default value, and maybe not even allow it to
> >>be set lower than '1'.
> >
> >I would say leave the default at 0 as it leaves no doubt that you are
> >performing without a net. A setting of '1' implies you are covered and for
> >a fast moving cluster or slow moving one with sufficient downtime that
> >would not be the case.
> 
> Can you explain how it can fail in the case of a "slow moving cluster with
> sufficient downtime"?

I'm guessing 'sufficient downtime' here is, basically, 'offline until
the next checkpoint', which isn't actually all *that* much time.

> It seems to me that if I have correctly understood what happened in this
> case then 0, the default, really cannot ever work properly when you have
> enabled WAL archiving plus streaming.

Well, it's not like it'd work without WAL archiving either, though
that's perhaps more obviously true.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> Adrian Klaver wrote:
> >On 08/12/2018 02:56 PM, Phil Endecott wrote:
> >>Anyway.  Do others agree that my issue was the result of
> >>wal_keep_segments=0 ?
> >
> >Only as a sub-issue of the slave losing contact with the master. The basic
> >problem is maintaining two separate operations, archiving and streaming,
> >in sync. If either or some combination of both lose synchronization then
> >it is anyone's guess on what is appropriate for wal_keep_segments.

Uh, no, having an archive_command and a restore_command configures
exactly should remove the need to worry about what wal_keep_segments is
set to because anything not on the primary really should be available
through what's been archived and PG shouldn't have any trouble figuring
that out and working with it.

If all you've got is streaming replication then, sure, you have no idea
what to set wal_keep_segments to because the replica could be offline
for an indeterminate amount of time, but as long as you're keeping track
of all the WAL through archive_command, that shouldn't be an issue.

> Really?  I thought the intention was that the system should be
> able to recover reliably when the slave reconnects after a
> period of downtime, subject only to there being sufficient
> network/CPU/disk bandwidth etc. for it to eventually catch up.

Yes, that's correct, the replica should always be able to catch back up
presuming there's no gaps in the WAL between when the replica failed and
where the primary is at.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 08/13/2018 05:39 AM, Stephen Frost wrote:
> >* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> >>Adrian Klaver wrote:
> >>>On 08/12/2018 02:56 PM, Phil Endecott wrote:
> >>>>Anyway.  Do others agree that my issue was the result of
> >>>>wal_keep_segments=0 ?
> >>>
> >>>Only as a sub-issue of the slave losing contact with the master. The basic
> >>>problem is maintaining two separate operations, archiving and streaming,
> >>>in sync. If either or some combination of both lose synchronization then
> >>>it is anyone's guess on what is appropriate for wal_keep_segments.
> >
> >Uh, no, having an archive_command and a restore_command configures
> >exactly should remove the need to worry about what wal_keep_segments is
> >set to because anything not on the primary really should be available
> >through what's been archived and PG shouldn't have any trouble figuring
> >that out and working with it.
> >
> >If all you've got is streaming replication then, sure, you have no idea
> >what to set wal_keep_segments to because the replica could be offline
> >for an indeterminate amount of time, but as long as you're keeping track
> >of all the WAL through archive_command, that shouldn't be an issue.
> 
> Therein lies the rub. As I stated previously the bigger issue is syncing two
> different operations, archiving and streaming. 

That's not correct though, there isn't a big issue regarding syncing of
those two operations.

> The OP got caught short
> assuming the archiving would handle the situation where the streaming was
> down for a period. In his particular setup and for this particular situation
> a wal_keep_segments of 1 would have helped. I do not see this as a default
> value though as it depends on too many variables outside the reach of the
> database, mostly notably the success of the archive command. 

What's been pointed out here is that even if everything is working
(archive_command, restore_command, etc) there's a possible gap in the
transisition from replay-from-archive and starting streaming replication
where the replica might not be able to start streaming.

That's an entirely *PostgreSQL* issue, as far as I see it, and hasn't
got anything to do with his particular setup except that he managed to
expose the issue.

> First is the
> command even valid, two is the network link reliable, three is there even a
> network link, is there more then one network link, four is the restore
> command valid? That is just of the top of my head, more caffeine and I could
> come up with more. Saying that having archiving, streaming and a
> wal_keep_segments=1 has you covered, is misleading. I don't see it as
> detrimental to performance but I do see more posts down the road from folks
> who are surprised when it does not cover their case. Personally I think it
> better to be up front that this requires more thought or a third party
> solution that has done the thinking.

This is all down to "and you should also monitor to make sure things
continue working" which I certainly agree with but that doesn't mean we
shouldn't fix this issue.

This entire side-discussion feels like it's really off in the weeds.
The next steps which I outlined a while ago seem to still be entirely
appropriate and we should figure out a way to solve this issue so that,
when everything else is working (archive command, restore command,
replica is able to connect to the primary, etc), that PG behaves sanely
and is able to catch up with and connect to the primary and resume
streaming.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Replication failure, slave requesting old segments

2018-08-13 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 08/13/2018 05:08 AM, Phil Endecott wrote:
> >Adrian Klaver wrote:
> >Really?  I thought the intention was that the system should be
> >able to recover reliably when the slave reconnects after a
> >period of downtime, subject only to there being sufficient
> >network/CPU/disk bandwidth etc. for it to eventually catch up.

That's correct.

> See also my reply to Stephen earlier. Basically you are trying to coordinate
> two different operations. They start from the same source pg_xlog(pg_wal
> 10+) but arrive on a different time scale and from different locations.
> Without sufficient sanity checks it is possible they diverge enough on one
> or both paths to render the process unstable.

This isn't what's happening.  We're not talking about a timeline change
here or a replica being promoted to be a primary in general.  There's no
diverging happening- it's the same consistent WAL stream, just coming
from two different sources, which PG is specifically designed to handle
and should be handling seamlessly.

> I would say that:
> 
> "If you set up a WAL archive that's accessible from the standby, these
> solutions are not required, since the standby can always use the archive to
> catch up provided it retains enough segments."
> 
> should be more like:
> 
> "If you set up a WAL archive that's accessible from the standby, these
> solutions are not required, since the standby can always use the archive to
> catch up provided it retains enough segments. *This is dependent on
> verification that the archiving is working properly. A belt and suspenders
> approach would be to set wal_keep_segments to a value > 0 in the event
> archiving is not properly functioning*"
> "

I don't think I can disagree more with this additional wording, and I
*really* don't think we should be encouraging people to set a high
wal_keep_segments.  The specific case here looks like it just need to be
set to, exactly, '1', to ensure that the primary hasn't removed the last
WAL file that it archived.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Duplicating data folder without tablespace, for read access

2018-08-14 Thread Stephen Frost
Greetings,

* Jack Cushman (jcush...@gmail.com) wrote:
> I have a large database of text, with a 600GB table and a 100GB table
> connected by a join table. They both see occasional updates throughout the
> week. Once a week I want to "cut a release," meaning I will clone just the
> 100GB table and copy it to a "release" server for read-only web access.

My general recommendation to people who are thinking about something
like this is to use their restore-tests as a way to stage things (you
are testing your backups by doing a restore, right?) and then copy over
the results.

> My procedure would be:
> 
> - keep the 600GB table on a separate tablespace
> - cleanly stop postgres on both servers
> - copy the data folder to the release server
> - delete pg_tblspc/* on the release server
> - start postgres on both servers

So, instead of that procedure, it'd be:

- Back up the database as per usual
- Restore the database somewhere
- Run some sanity checks on the restored database
- go in and drop the table and sanitize anything else necessary
- Shut down the database and copy it into place
- OR take a new backup of the sanitized database and then restore it
  into place

Much cleaner, and tests your backup/restore process.

Alternatively, you could just track changes to the "main" database using
triggers into an audit log and then replay the changes made to the 100GB
table into the other database.

> In local testing this seems to work -- the release server works fine, and I
> only get an error message if I try to access the missing tables, which is
> expected. But are there reasons this is going to bite me if I try it in
> production? I'm hoping it helps that (a) I'm only doing read access, (b) I
> can cleanly stop both servers before cutting a release, and (c) I'm not
> worried about losing data, since it's just an access copy.

Still, it's a hacked up and not entirely proper PG database which will
likely lead to confusion- maybe you won't be confused, but I strongly
suspect others looking at it will be, and you might run into other
issues too along the lines of what Tom mentioned (background jobs
failing and such).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pg_basebackup failed to read a file

2018-08-14 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 08/14/2018 11:14 AM, Tom Lane wrote:
> >Mike Cardwell  writes:
> >>pg_basebackup: could not get write-ahead log end position from server:
> >>ERROR:  could not open file "./postgresql.conf~": Permission denied
> >>Now, I know what this error means. There was a root owned file at
> >>"/var/lib/pgsql/10/data/postgresql.conf~" which contained an old
> >>version of our postgres config and was not readable by the postgres
> >>user. I'll delete this file and try again. However, in the mean time: I
> >>feel like it would be useful for pg_basebackup to check that it has
> >>read access to all of the existing files in the source directory at the
> >>start, before it begins it's copy.
> >That seems like a pretty expensive thing to do, if there are lots of
> >files ... and you'd still end up failing, so it's not moving the ball
> >very far.
> 
> Why is checking a bunch of file permissions anywhere close to being as
> expensive as transferring 1.5TB over a WAN link?

One could argue that the cost would be bourn by everyone who is using
pg_basebackup and not just those users who are transferring 1.5TB over a
WAN link.

That said, pgbackrest always builds a full manifest by scanning all of
the directories, tablespaces, files, etc, and I can't recall anyone ever
complaining about it.  Certainly, failing fast would be better than
failing after a lot of time has been spent.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-08-15 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Unless there are substantial objections, or nontrivial changes as a result
> of this round of comments, we anticipate making the CoC official as of
> July 1 2018.

We seem to be a bit past that timeline...  Do we have any update on when
this will be moving forward?

Or did I miss something?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: During promotion, new master tries to archive same segment twice

2018-08-16 Thread Stephen Frost
Greetings,

* Phil Endecott (spam_from_pgsql_li...@chezphil.org) wrote:
> David Steele wrote:
> >pgBackRest has done this for years and it saves a *lot* of headaches.
> 
> The system to which I am sending the WAL files is a rsync.net
> account.  I use it because of its reliability, but methods for
> transferring files are limited largely to things like scp and
> rsync.

You might consider using a well-tested PG backup tool locally and then
simply rsync that backup over to rsync.net.  Certainly with pgbackrest,
we intend and expect people to use more routine "just copy all the
files" backup methods for backing up the repository.

Basically, you could pgbackrest to /some/local/path and then rsync from
there over to rsync.net.

Having an rsync or scp 'storage' option (similar to the s3 one) for
pgbackrest might be interesting..  We discussed having the ability to
start a backup from the PG server at one point but in that discussion
we were thinking pgbackrest would also be installed on the backup
server.  This would be different from that in that the remote side would
only need to support rsync or scp.  You'd have to accept that if the
the backup server dies then you lose data though, since I don't believe
there's a way to ask for an fsync() through rsync or scp, which makes it
a much less compelling feature, unless rsync.net guarantees writes
somehow..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pg_upgrade (and recovery) pitfalls

2018-08-16 Thread Stephen Frost
Greetings,

* PO (gunnar.bl...@pro-open.de) wrote:
> Consider the following scenario/setup:
> - 4 DB servers in 2 DCs
>   - 1 primary (in DC1)
>   - 1 sync secondary (in other DC)
>   - 2 async secondaries (distributed over DCs)

I'm a bit surprised that you're ok with the latency imposed by using
sync replication to another data center.  I'm guessing they're pretty
close to each other?

> - General requirements are:
>   - *always* have one sync secondary online (no exceptions)

Well, you kind of have to or everything stops. ;)

> The "naive" idea was to shutdown all instances (starting with the primary to 
> enable final syncs), run "pg_upgrade -k" on both the former primary and the 
> former sync secondary, re-link the recovery.conf on the secondary, re-enable 
> the "primary" IP and start both.
> 
> D'oh! The secondary is complaining about a different cluster identifier:
> "FATAL:  database system identifier differs between the primary and standby"

No, you can't do that.

> (From looking at the code, I could not determine straight away when and how 
> this identifier is generated, but I guess it is somewhere in initdb.c?)

Yes.

> So, as we can't rsync (no ssh...), which would probably finish in a few 
> seconds, a pg_basebackup is due. Which can be a PITA when the DB size is 
> scraping on a TB and you have a single 1GB/sec connection. Bye, bye, 
> availability (remember the primary requirement?).

The rsync *might* finish quickly but it depends a lot on the specifics
of your environment- for example, the rsync method doesn't do anything
for unlogged tables, so if you have large unlogged tables you can end up
with them getting copied over and that can take a long time, so, some
prep work should be done to make sure you nuke any unlogged tables
before you go through with the process (or do something similar).

pg_basebackup has the unfortunate issue that it's single-threaded,
meaning that enabling compression probably will cause the system to
bottle-neck on the single CPU before reaching your 1Gb/s bandwidth
limit anyway.  You could parallelize the backup/restore using pgbackrest
or, in recent versions I think, with barman, and that should at least
get you to be able to fill the 1Gb/s pipe with compressed data for the
backup.  You're likely still looking at an hour or more though to get
all that data copied over that small a pipe.

> ==> So by now, we're only pg_upgrade'ing the primary and follow up with a 
> pg_basebackup to the secondary, planning for much longer downtimes. <==

I have to say that I probably would argue that you should really have at
least two replicas in the same DC as the primary and then use
quorom-based syncronous replication.  Presumably, that'd also increase
the bandwidth available to you for rebuilding the replica, reducing the
downtime associated with that.  That might also get you to the point
where you could use the rsync method that's discussed in the pg_upgrade
docs to get the replicas back online.

> After finishing the pg_basebackup, re-link the recovery.conf, start. 
> The recovery finds history-files from higher timelines in the archive, starts 
> to recover those (?) and then complains that the timeline doesn't match 
> (don't take the numbers here too seriously, this is from a low-traffic test 
> system, the fork off TL 1 was at least a year ago):
> 
> restored log file "0002.history" from archive
> restored log file "0003.history" from archive
> restored log file "0004.history" from archive
> FATAL:  requested timeline 3 is not a child of this server's history
> DETAIL:  Latest checkpoint is at 9C/36044D28 on timeline 1, but in the 
> history of the requested timeline, the server forked off from that timeline 
> at 69/8800.
> 
> This mess can probably be cleaned up manually (delete the 00[234].history 
> etc. on both the secondary and the BARMAN archive), however to be 100% safe 
> (or when you're unexperienced), you take another basebackup :/

Whoa  No, this isn't good- once you've done a pg_upgrade, you're
on a *new* cluster, really.  There's no playing forward between an old
PG server and a new one that's been pg_upgrade'd and you should really
be using a tool that makes sure you can't end up with a messed up
archive like that.  What seems to be happening here is that your restore
command is trying to pull from the *old* server's WAL and history files
and that's *wrong*.

pgbackrest has a way to handle this and keep the stanza name the same by
using a 'stanza-upgrade', but in no case should a restore command be
pulling WAL files (of any sort) from the archive of a server with a
different system identifier.  pgbackrest won't let that happen.

> And - after moving the *.history files out of the way in the archive - the 
> secondary finally starts and starts receiving WALs.

That's really grotty. :(

> Sidenote: this second problem happened to us as well after a colleague 
> promoted a secondary for some R/W tests w/out first disabling t

Re: pg_upgrade (and recovery) pitfalls

2018-08-17 Thread Stephen Frost
Greetings,

* PO (gunnar.bl...@pro-open.de) wrote:
> Stephen Frost – Thu, 16. August 2018 19:00
> > * PO (gunnar.bl...@pro-open.de) wrote:
> > > - why does a recovery, based on a recovery.conf that points to a reachable
> > primary (which obviously communicates its own timeline), still look for 
> > higher
> > timelines' history-files in the archive and tries to jump onto these
> > timelines? This doesn't seem reasoable to me at all...
> > 
> > PG is going to start from the current timeline and try to find all the
> > timelines that it could possibly play forward to and at what point the
> > timeline changes were done and then it's going to figure out which
> > timeline to go to (by default we try to stick with the currnet timeline,
> > but you can configure recovery.conf to specify a different timeline or
> > 'latest') and then it's going to request the WAL to get from where PG is
> > to the end of whichever timeline it thinks you want. That's all
> > entirely reasonable and how things are supposed to work.
> > 
> > Only once PG reaches the end up what's available through the restore
> > command does it start trying to talk to the primary. There's been some
> > discussion about how it might be nice to be able to configure PG to
> > prefer going to the primary instead, though, really, it should typically
> > be faster to replay WAL from a restore_command than to get it from the
> > primary over the network, not to mention that getting it from the
> > primary will introduce some additional load on the system.
> 
> Fair enough, and I onlöy just realised I've been carrying a 
>   recovery_target_timeline = 'latest'
> around from my predecessors. :facepalm:

That isn't necessairly a bad thing to have, especially in environments
where you're promoting replicas to be primaries and flipping things
around.

> I owe you a beer or X (in Lisbon?)!

Yes, I'll be in Lisbon, would be happy to chat over a beer.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: CTE with JOIN of two tables is much faster than a regular query

2018-08-18 Thread Stephen Frost
Greetings,

* kpi6...@gmail.com (kpi6...@gmail.com) wrote:
> The CTE mentioned below completes the query in 4.5 seconds while the regular
> query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query
> starts with a full table scan over "Doc" while the CTE joins the two tables
> first and applies the filter condition in the 2nd step. 
> 
> I believe that some rows in "Doc" which are not referenced by "F" contain a
> large amount of data in the field "szText" and this will slow down the ILIKE
> operator. 

Yup, that appears to be what's happening.

> What can I do to improve the performance of the regular query without using
> a CTE? 

You could possibly build a trigram index on the field you're searching,
which could avoid the full table scan.  Of course, that index could be
quite large, so there's downsides to that.  If these are words you're
looking for then you could use PG's full text indexing to build indexes
on the words and then use that instead.  If you are fine working with
words but are concerned about misspellings then you can extract out the
distinct words, build a trigram index on those, find the most similar
words based on the input and then search for those words using the FTI.

Unfortunately, we don't currently pay attention to things like average
string length when considering the cost of performing an 'ilike', so we
figure that doing the filtering first and then the join will be faster,
but that obviously falls over in some cases, like this one.  Using the
CTE forces PG to (today, at least) do the join first, but that isn't
really good to rely on.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: upgrading from pg 9.3 to 10

2018-08-20 Thread Stephen Frost
Greetings,

* bricklen (brick...@gmail.com) wrote:
> On Tue, Aug 14, 2018 at 10:45 AM Edmundo Robles 
> wrote:
> > Is safe  to upgrade from pg 9.3 to pg 10 directly using pg_upgrade or
> > is better upgrade, with pg_upgrade,  from  9.3 -> 9.4 ->9.5 -> 9.6 -> 10.
> 
> Using pg_upgrade, it is definitely possible to upgrade 9.3 to 10 in one
> jump. We did this in production earlier this year for 1500 Postgres
> clusters.
> At https://bricklen.github.io/2018-03-27-Postgres-10-upgrade/ I documented
> (in excrutiating detail, perhaps) how we upgraded those clusters from 9.3
> to 10 with sub-15 minute downtime per cluster. The only real issues we ran
> into were some corrupted indexes that appeared to be related to 10.1 and
> 10.2. Upgrading to 10.3/10.4 and reindexing fixed those particular
> corrupted indexes.

Interesting write-up.  A few points:

#1: I'd strongly recommend including something in the write-up about
checking for unlogged tables.  Those can add quite a bit of time to the
rsync if they aren't truncated before the primary cluster is shut down.

#2: The issue with timelines leads me to suspect that you had a
restore_command defined and that when PG started up, it found the
timeline history files from the old cluster.  If you don't have a
restore_command set for any of these then I'm very curious what
happened.  The other possibility (though I wouldn't have expected a
timeline complaint from PG...) is that the replica wasn't fully up to
date for whatever reason.

#3: There's a number of checks discussed in the upgrade documentation
around using the rsync-based method, but it doesn't seem like you did
those.  A mention of them might be good.  Note that these are
particularly important because the rsync will *not* copy over changes to
the data files except in the relatively rare case of the relfilenode's
size changing (given that most of them are 1G, that's not too likely).
The note you have about the rsync taking more time due to "if the
remote replica was fully caught up when it was shut down" isn't
accurate- there is no WAL replay that happens on the replica using this
method to 'catch up' and if WAL replay was required to have this process
be correct then it simply wouldn't ever work.

#4: pg_upgrade absolutely resets the timeline to '1', and you shouldn't
ever copy over history files from the old cluster to the new cluster.
The new replicas will *also* be on timeline '1'.

#5: There's no such thing as a 'timeline decrement'.  The new cluster
(either on the primary or the replica) should only ever see itself as
being on timeline '1' when starting up after the pg_upgrade and before a
promotion happens.

#6: In note 33, it's unclear what this is referring to.  There's no WAL
which would have been generated by the pg_upgrade (that process is not
WAL'd).  Perhaps some activity needed to be done on the primary before a
new restorepoint would happen on the replica, due to how pg_basebackup
needs a restorepoint to begin from when working on a replica.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Will there ever be support for Row Level Security on Materialized Views?

2018-08-28 Thread Stephen Frost
Greetings,

* Dean Rasheed (dean.a.rash...@gmail.com) wrote:
> On 28 August 2018 at 01:49, Alvaro Herrera  wrote:
> > On 2018-Aug-27, Ken Tanzer wrote:
> >>- In the scheme of things, is it a lot of work or not so much?
> >
> > Probably not much.
> 
> Yeah, it doesn't seem like it would be particularly difficult, but it
> would probably still be a reasonable amount of work to go round
> finding all the places in code that would need updating. I.e., I think
> it would be more mechanical work, than anything fundamentally
> challenging.
> 
> On the face of it, it seems like it might be a reasonable thing to
> support, but I wonder, is this really just syntactic sugar for
> creating a security barrier view on top of the materialized view?
> 
> When RLS was originally implemented, that same question was asked for
> tables, but the answer was "no" because RLS on a table gives you
> fine-grained (row-level) control over what data in the table can be
> modified as well as read, which a SB view doesn't give you. But for a
> MV view, that's not a consideration, so what would RLS on a MV
> actually give you?

I see value in being able to have a consistent set of policies which
are applied across tables, views, matviews, etc.  Also, with simple
updateable views, updates can be done, so there's also that to consider.
Ultimately, I do think it'd be good to have RLS support for views, mat
views, and foreign tables.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: locate DB corruption

2018-09-02 Thread Stephen Frost
Greetings,

* Dave Peticolas (d...@krondo.com) wrote:
> On Sat, Sep 1, 2018 at 5:09 PM Adrian Klaver 
> wrote:
> 
> > On 09/01/2018 04:45 PM, Dave Peticolas wrote:
> >
> > > Well restoring from a backup of the primary does seem to have fixed the
> > > issue with the corrupt table.
> >
> > Pretty sure it was not that the table was corrupt but that transaction
> > information was missing from pg_clog.
> >
> > In a previous post you mentioned you ran tar to do the snapshot of
> > $PG_DATA.
> >
> > Was there any error when tar ran the backup that caused you problems?
> 
> Well the interesting thing about that is that although the bad table was
> originally discovered in a DB restored from a snapshot, I subsequently
> discovered it in the real-time clone of the primary from which the backups
> are made. So somehow the clone's table became corrupted. The same table was
> not corrupt on the primary, but I have discovered an error on the primary
> -- it's in the thread I posted today. These events seem correlated in time,
> I'll have to mine the logs some more.

Has this primary been the primary since inception, or was it promoted to
be one at some point after first being built as a replica..?

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Why my query not doing index only scan

2018-09-10 Thread Stephen Frost
Greetings,

* Arup Rakshit (a...@zeit.io) wrote:
> I have an index defined "inspector_tool_idx4_1" UNIQUE, btree (company_id, 
> item_code, deleted_at). Now I am using the *company_id* column in the where 
> clause, and the selecting just the *item_code* field for all matching rows. I 
> expected here the planner will do a index only scans. But it is doing bitmap 
> index scan. Any idea what it is not doing what I expected it to do.

One possibility is that the visibility map isn't current.

Indexes don't include visibility information.  The way an index-only
scan works is that we track pages which are 'all visible' (meaning that
every tuple on that page is visible to all running transactions) in a
seperate file called the 'visibility map' (aka the VM).  The VM is
updated by the VACUUM process- but we only automatically run a VACUUM
(with the autovacuum process) when thresholds have been reached for the
number of UPDATE'd or DELETE'd tuples.

What this means is that if you are playing around in development and
just COPY a bunch of rows into a table, but don't ever UPDATE or DELETE
the rows in that table, then you'll almost never get an index-only scan
because the VM won't be current (and PG knows this).

Make sure to do a VACUUM after loading data (and letting any ongoing
transactions finish) and then re-test.  That should make it sure that
the VM is current and make it more likely that PG will do an index-only
scan.  Not a guarantee still, but that's the first thing I'd try, based
on what you've shared here.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Why my query not doing index only scan

2018-09-10 Thread Stephen Frost
Greetings,

* Arup Rakshit (a...@zeit.io) wrote:
> I would like to ask one more question related to this topic. When I take a 
> dump from production, and restore it to development DB, what are the commands 
> I generally need to run to dev deb quack close to production?

The best way to get a prod-like environment in development is to use a
file-level backup tool, like pgBackRest.  There's a few nice things that
does:

- Copies the database files at a physical level, making it much closer
  to what prod is like
- Checks your database checksums (if you have them enabled, which
  hopefully you do, see initdb -k)
- Verifies your backup/restore approach
- Provides the fastest way to perform a restore in the event of an issue
- Allows you to do point-in-time-recovery (PITR)

Using pg_dump/pg_restore takes a logical export of the database and then
imports it into a new PG cluster.  Even if you do VACUUM ANALYZE after
doing pg_dump/restore, you're going to end up with (somewhat, at least)
different stats, all the tables will be completely packed (no bloat, no
space for new tuples to go into existing pages), all of the indexes will
be completely rebuilt and pristine, etc.  In the end, it's pretty
different and while it'll behavior in a similar manner to prod in a lot
of ways, there'll be a lot of ways it doesn't too.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-09-14 Thread Stephen Frost
Greetings,

* Adrian Klaver (adrian.kla...@aklaver.com) wrote:
> On 9/14/18 1:31 AM, Chris Travers wrote:
> >On Wed, Sep 12, 2018 at 10:53 PM Tom Lane  ><mailto:t...@sss.pgh.pa.us>> wrote:
> >
> >    I wrote:
> > > Stephen Frost mailto:sfr...@snowman.net>>
> >writes:
> > >> We seem to be a bit past that timeline...  Do we have any update
> >on when
> > >> this will be moving forward?
> > >> Or did I miss something?
> >
> > > Nope, you didn't.  Folks have been on holiday which made it hard
> >to keep
> > > forward progress going, particularly with respect to selecting
> >the initial
> > > committee members.  Now that Magnus is back on shore, I hope we can
> > > wrap it up quickly --- say by the end of August.
> >
> >I apologize for the glacial slowness with which this has all been
> >moving.
> >The core team has now agreed to some revisions to the draft CoC based on
> >the comments in this thread; see
> >
> >https://wiki.postgresql.org/wiki/Code_of_Conduct
> >
> >(That's the updated text, but you can use the diff tool on the page
> >history tab to see the changes from the previous draft.)
> >
> >I really have to object to this addition:
> >"This Code is meant to cover all interaction between community members,
> >whether or not it takes place within postgresql.org
> ><http://postgresql.org> infrastructure, so long as there is not another
> >Code of Conduct that takes precedence (such as a conference's Code of
> >Conduct)."

I was wondering about that myself and rather had an objection to
implying that this CoC doesn't apply when there's a CoC set up for some
event.  The CoC for an event is typically going to be thinking about
things from the event's timeline (which is on the order of days),
whereas something which happened at an event reflects on the community
and should also be addressed at that level.

> I second that objection. It is not in PGDG's remit to cure the world, for
> whatever form of cure you ascribe to. This is especially true as 'community
> member' has no strict definition.

The goal of this CoC isn't to cure the world, it's to define what's
acceptable behavior to continue to be a member of this community, to
participate in this community through the mailing lists, IRC, etc, and
to be seen as a representative of the community/project.

We certainly have both the right and the remit to define who we want to
have in our community and to represent this community and project to
other communities, projects, organizations, and to people in general.
This CoC is about making it clear what's acceptable and what isn't and
making it clear to everyone, including other communities, that we take
conduct seriously and have a mechanism for dealing with issues that's
fair and reasonable.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-09-14 Thread Stephen Frost
Greetings,

* Joshua D. Drake (j...@commandprompt.com) wrote:
> I think this is a complicated issue. On the one hand, postgresql.org has no
> business telling people how to act outside of postgresql.org. Full stop.

This is exactly what this CoC points out- yes, PG.Org absolutely can and
should consider the behavior of individuals as a whole, regardless of
where, when it comes to deciding if it's appropriate for that individual
to continue to be a member of this community.  The CoC isn't about
everyone in the world, nor is it trying to address the actions of
individuals who are not members of this community, but it's definitely
about more than just actions seen on these mailing lists.

> On the other hand if you are (note: contributor, not community member which
> is different) contributor to PostgreSQL, your actions speak about
> PostgreSQL. So I am not sure what a good plan of action here would be.

The line being drawn here isn't terribly clear and I don't know that
it's really useful to try and draw a line.  There's a limit to what PGDG
is able to do from a technical perspective, but anything which is able
to be done within PGDG should be done to distance the community and
project, to the fullest extent possible, from inappropriate behavior.
That could be someone causing problems on IRC or on the mailing lists or
somewhere else, even if that individual isn't listed as a contributor or
involved in the project in other ways.  Naturally, there are different
levels and that's why there's a CoC committee to consider what's fair
and reasonable and at least part of that will probably take into
consideration an individual's role in the community.

> There was a time when Open Source was about code and community. It is clear
> that it is becoming about authority and politics.

This isn't actually anything new, to be clear, this is simply a
definition and documentation to provide clarity and a seperate committee
which Core is delegating out responsibility to.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-09-14 Thread Stephen Frost
Greetings,

(trimmed to -general, tho I don't know if it'll really help)

* James Keener (j...@jimkeener.com) wrote:
> > To many of us, we absolutely are a community. Remember, there are people
> > here who have been around for 20+ years, of which many have become close
> > friends, having started working on PostgreSQL as a hobby. We have always
> > seen the project as a community of like-minded technologists, and welcome
> > others that wish to join, whether just to ask a single question or to hang
> > around for the next 20 years. I do see your viewpoint, but I would counter
> > that coming here for help (for example) is quite different from calling
> > tech support at a vendor.
> 
> I fail to see how that makes everyone here part of a community anymore than
> I'm part of the "community" of regulars at a bar I walk into for the first
> time.

Does the bartender get to kick you out if you get into a fight?  Or if
you're rude or inappropriate towards the waitress?  Yup, doesn't matter
if it's the first time you were in the bar or not.

> As I said, the rules can and should apply within the list, but applying
> them outside the list is odd and wreaks of authoritarianism.

This is more akin to an argument that the bartender can't ban you if you
got into a fight outside the bar- but it falls flat because, yeah,
they can.  Is the bartender likely to ban you because you made one rude
comment or said something on twitter that wasn't about their bar?
Probably not, but it doesn't mean it's not within their right to do so
if they found it particularly concerning (such as threats made against a
regular to the bar or such).

Ultimately, I do tend to agree with the other points made on this thread
that we end up throwing up a lot of 'straw men' attacks and that
analogies tend to not work out too well in the end, but that's part of
why we have a committee made up of reasonable people to consider a
particular complaint and address it, or not, as appropriate.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-09-14 Thread Stephen Frost
Greetings,

* James Keener (j...@jimkeener.com) wrote:
> > > I fail to see how that makes everyone here part of a community anymore
> > than
> > > I'm part of the "community" of regulars at a bar I walk into for the
> > first
> > > time.
> >
> > Does the bartender get to kick you out if you get into a fight?  Or if
> > you're rude or inappropriate towards the waitress?  Yup, doesn't matter
> > if it's the first time you were in the bar or not.
>
> You're perverting and twisting my argument. Don't do that.

I was trying to follow your analogy.  My apologies that it's not a great
one, I raised that same concern in the part of my email you omitted.

> My comment was that I'm not part of the "community" of the bar by simply
> walking into the bar; not that the bar has to serve me.
> 
> Please try to argue only what's being argued and not what you think you're
> reading into my comments.

The point I was making is that these lists are more like the bar and the
list manager like the bartender.  Yes, actions outside of the lists can
impact if someone's allowed to participate on these lists.  There's, of
course, a test of reasonableness and things like disagreements about
political views expressed outside of these lists aren't likely to make
the CoC feel that someone isn't appropriate for participation, even if a
complaint is made, but that doesn't mean that only actions on the list
are considered.

(note that I'm not part of the CoC, nor core, this is my expression of
how I feel things should be, as a member of this community)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct plan

2018-09-14 Thread Stephen Frost
Greetings,

* Dimitri Maziuk (dmaz...@bmrb.wisc.edu) wrote:
> On 09/14/2018 12:46 PM, Peter Geoghegan wrote:
> > On Fri, Sep 14, 2018 at 10:31 AM, Dimitri Maziuk  
> > wrote:
> >> So let me get this straight: you want to have a "sanctioned" way to deny
> >> people access to postgresql community support channel?
> > 
> > Yes.
> 
> A question to TPTBs, then: once The Great Plan is implemented, will I be
> automagically unsubscribed from all postgres lists because I did not
> explicitly agree to abide by The Rules And Regulations back when I
> susbscribed?

The short answer is: probably.  We have been working for a while to
implement a mechanism to get people to explicitly opt-in for certain
things, like having all posts made public, due to GDPR requirements, and
I'm kinda hoping that this gets folded into it.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct

2018-09-18 Thread Stephen Frost
Greetings,

* Chris Travers (chris.trav...@gmail.com) wrote:
> I said I would stand aside my objections after the last point I mentioned
> them but I did not feel that my particular objection and concern with
> regard to one specific sentence added got much of a hearing.  This being
> said, it is genuinely hard to sort through the noise and try to reach the
> signal.  I think the resurgence of the debate about whether we need a code
> of conduct made it very difficult to discuss specific objections to
> specific wording.  So to be honest the breakdown was mutual.

I would ask that you, and anyone else who has a suggestion for how to
improve or revise the CoC, submit your ideas to the committee by
email'ing c...@postgresql.org.

As was discussed previously, the current CoC isn't written in stone and
it will be changed and amended as needed.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Code of Conduct

2018-09-19 Thread Stephen Frost
Greetings,

* Francisco Olarte (fola...@peoplecall.com) wrote:
> I will happily pardon brevity ( although I would not call a ten line
> sig plus a huge bottom quote "breve", and AFAIK it means the same in
> english as in spanish ) and/or typos, but the "I am not responsible"
> feels nearly insulting. Did someone force you to use "this device" (
> which you seem to perceive as inadequate for a nice answer ) to reply,
> or did you choose to do it ? ( real, not rethoric question, but do not
> answer if you feel  its inadequate )

Let's please try to keep the off-topic discussion on these lists to a
minimum.

> As an aside, is this kind of afirmations and/or my response to it a
> violation of the current CoC ?

There's a way to find out the answer to that question, but it's
certainly not to send an email to this list asking about it.  Please
review the policy, and follow the process outlined there if you feel the
need to.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Setting up continuous archiving

2018-09-26 Thread Stephen Frost
Greetings,

* Yuri Kanivetsky (yuri.kanivet...@gmail.com) wrote:
> I'm trying to compile a basic set of instruction needed to set up
> continuous archiving and to recover from a backup. I'm running
> PostgreSQL 9.3 on Debian Stretch system.

9.3 is about to be end-of-life in just another month or so, see:

https://www.postgresql.org/support/versioning/

As mentioned, this is an extremely complicated subject and you should
really use one of the tools that's been written to do exactly this.
Here's a few comments as to why-

> Setting up continuous archiving
> 
> * Set up WAL archiving
> 
> * on backup server under postgres user
> 
> * create /var/lib/postgresql/wal_archive dir
> 
> $ mkdir /var/lib/postgresql/wal_archive
> 
> * on database server under postgres user
> 
> * generate ssh key
> 
> $ ssh-keygen -f /var/lib/postgresql/.ssh/id_rsa   # providing
> path to key file makes it
>   # to not ask 
> questions
> 
> * add corresponding record to known_hosts file
> 
> $ ssh-keyscan -t rsa BACKUP_SRV >> ~/.ssh/known_hosts
> 
> * locally
> 
> * authorize login from database to backup server
> 
> $ ssh DATABASE_SRV cat ~/.ssh/id_rsa.pub | ssh BACKUP_SRV
> 'mkdir --mode 0700 .ssh; cat >> ~/.ssh/authorized_keys; chmod 0600
> .ssh/authorized_keys'
> 
> * on database server under root
> 
> * change postgresql.conf
> 
> wal_level = archive
> archive_mode = on
> archive_command = 'rsync -a %p
> BACKUP_SRV:/var/lib/postgresql/wal_archive/%f'

This rsync command does nothing to verify that the WAL file has been
persisted to disk on the backup server, which is a problem if the backup
server crashes or there's some kind of issue with it after the rsync
finishes (you'll end up with gaps in your WAL stream which could prevent
you from being able to restore a backup or from being able to do PITR).

A good backup tool would also calculate a checksum of the WAL file and
store that independently, verify that the WAL file is for the cluster
configured (and not for some other cluster because someone mistakenly
tried to start archiving two primaries into the same location), verify
that the size of the WAL file is what's expected, and probably do a few
other checks that I'm not remembering right now, but which tools like
pgBackRest do.

> * restart PostgreSQL
> 
> # systemctl resart postgresql
> 
> * Make a base backup
> 
> * on database server under root
> 
> * add a line to postgresql.conf
> 
> max_wal_senders = 1
> 
> * add a line to pg_hba.conf
> 
> host  replication  replication  BACKUP_SRV_IP/BACKUP_SRV_NETMASK  
> trust
> 
> * restart PostgreSQL
> 
> # systemctl restart postgresql
> 
> * on database server under postgres user
> 
> * create replication user
> 
> CREATE USER replication WITH REPLICATION;
> 
> or
> 
> $ createuser --replication replication
> 
> * on backup server under postgres user
> 
> * make base backup
> 
> $ pg_basebackup -h DATABASE_SRV -U replication -D
> /var/lib/postgresql/base_backups/$(date +%Y%m%d-%H%M%S)

pg_basebackup is pretty good and it'll soon be able to perform
page-level checksum validation of the database while doing a backup,
assuming checksums have been enabled, but sadly it certainly didn't do
that in 9.3.  pg_basebackup should ensure that everything is persisted
to disk, but it doesn't do anything to protect against latent corruption
happening.  To do that, an independent manifest of the backup needs to
be built which tracks the checksum of every file backed up and then that
needs to be checked when performing a restore.

> Restoring from a backup
> 
> * under root
> 
> * stop PostgreSQL if running
> 
> # systemctl stop postgresql
> 
> * under postgres user
> 
> * move data dir
> 
> $ mv 9.3{,-$(date +%Y%m%d-%H%M%S)}
> 
> * copy backup
> 
> $ mkdir 9.3
> $ cp -r base_backups/TIMESTAMP 9.3/main
> 
> * copy unarchived segment files
> 
> $ find 9.3-TIMESTAMP/main/pg_xlog -maxdepth 1 -type f -exec cp -t
> 9.3/main/pg_xlog {} +

This is not something which I'd generally encourage doing..

> * create recovery.conf in 9.3/main
> 
> restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'

This restore command doesn't perform any validation of the WAL file
which is being pulled back from the archive.

> * under root
> 
> * start PostgreSQL
> 
> # systemctl start postgresql
> 
> A few notes.
> 
> Running out of space on backup server can lead in its turn to database
> server running out of space, since WAL segment files stop being
> archived and keep piling up. The same might happen when archiving
> falls behind. Which also widens the window for data loss.

Yes, that's a concern.  pgBackRest has an option to allow you to choose
if you want to let the sy

Re: PG security alerts

2018-09-27 Thread Stephen Frost
Greetings,

* Ravi Krishna (srkrish...@aol.com) wrote:
> Is there a place to get all PG related security alerts?  I saw this in IBM 
> site:

https://www.postgresql.org/support/security/

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: vacuum question

2018-09-30 Thread Stephen Frost
Greetings,

* Torsten Förtsch (tfoertsch...@gmail.com) wrote:
> I have a table with a really small number of rows, usually about 1500,
> sometimes may be up to 5000. The usage pattern of that table is such that
> rows are inserted and kept for a while, mostly seconds or minutes but
> theoretically up to 1 year. After that they are deleted. No updates, just
> insert, delete. The table is pretty actively written, sometimes >100
> transactions per second.

The number of live tuples is never over 5000..?  Or 5000 per
transaction, meaning the upper bound is more like 500 000?

> Although the table is frequently auto-vacuumed, its size is growing over
> time. The size of the table corresponds to the remaining number of pages in
> the autovacuum log.
> 
>  pages: 0 removed, 30069 remain, 0 skipped due to pins, 29985 skipped
> frozen
>  tuples: 136 removed, 1643 remain, 15 are dead but not yet removable
>  buffer usage: 44327 hits, 0 misses, 21 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 0.036 MB/s
>  system usage: CPU 0.06s/0.05u sec elapsed 4.59 sec
> 
> What does "29985 skipped frozen" mean? I presume these skipped buffers is
> the table bloat.

No, the pages skipped because they're frozen just means that there's
nothing for vacuum to do on those pages.

To look at bloat, use pgstattuple.

> My way to get rid of the bloat is to cluster the table. That takes less
> than half a second and the next autovacuum log then looks like this:
> 
>  pages: 0 removed, 3493 remain, 0 skipped due to pins, 0 skipped frozen
>  tuples: 0 removed, 58873 remain, 57470 are dead but not yet removable
>  buffer usage: 9425 hits, 0 misses, 0 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
>  system usage: CPU 0.02s/0.03u sec elapsed 1.01 sec
> 
> For a number of autovacuum cycles "skipped frozen" remains 0 until
> eventually:
> 
>  pages: 0 removed, 4822 remain, 0 skipped due to pins, 4433 skipped frozen
>  tuples: 227 removed, 2609 remain, 13 are dead but not yet removable
>  buffer usage: 5269 hits, 0 misses, 315 dirtied
>  avg read rate: 0.000 MB/s, avg write rate: 2.065 MB/s
>  system usage: CPU 0.03s/0.01u sec elapsed 1.19 sec
> 
> Is there anything I can do to prevent that bloat from accumulating in the
> first place?

What problem is the bloat causing?  If you can keep all of your
transactions short and autovacuum the table fast enough then the table
size should stablize.

> Another thing that seems odd is the number of dead but not removable tuples
> in the record in the middle. Sifting through the log, I can see numbers up
> to 8. One of our replicas is configured with hot_standby_feedback. Can
> that be the culprit?

Yes, if you have hot_standby_feedback on then a long running transaction
on the replica would have the same effect as on the primary- PG is
unable to mark rows as reusable until the oldest transaction in the
system is after the commit where the rows were deleted.

If you need to be able to perform long running transactions on the
primary or replica w/ hot standby feedback enabled then there's a few
other approaches which can be used but they're more complicated and
involve things which aren't MVCC-safe (which is why they work to avoid
bloat, but does mean that those long-running transactions won't see rows
that maybe they should have if they look at this table).  Consider
having two tables where you flip between them periodically and empty the
prior one and then TRUNCATE it, then flip again, etc.  The TRUNCATE will
clear out all of the bloat, but will remove rows that a long running
transaction maybe should have been able to see (and which a routine
VACUUM would have kept, just in case).

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Rearchitecting for storage

2019-07-21 Thread Stephen Frost
Greetings,

* Matthew Pounsett (m...@conundrum.com) wrote:
> On Fri, 19 Jul 2019 at 11:25, Peter J. Holzer  wrote:
> > On 2019-07-19 10:41:31 -0400, Matthew Pounsett wrote:
> > > Okay.  So I guess the short answer is no, nobody really knows how to
> > > judge how much space is required for an upgrade?  :)
> >
> > As I understand it, a pg_upgrade --link uses only negligible extra
> > space. It duplicates a bit of householding information, but not your
> > data tables or indexes. Your 18 TB table will definitely not be duplicated
> > during the upgrade if you can use --link.
> 
> The documentation for pg_upgrade --link says that the old copy is no longer
> usable, 

That's not entirely true- the old copy is only no longer usable *after*
you've started the new version of the DB against those data files.  If
you haven't started the new major version of PG yet, then you can go
back to using the old version against those files.

> which means it's modifying files that are linked.

No, it doesn't.

> If it were only
> modifying small housekeeping files, then it would be most efficient not to
> link those, which would keep both copies of the db usable.

The catalog tables *aren't* linked.  Both copies of the DB are usable-
but only until you start the DB against one of the versions.  Once
you've started either the old version or the new version, you can't
switch.  If you started the old version, then you could do another
pg_upgrade, of course, but you can't use the new version as there will
have been changes made to the catalog tables and control file (which
aren't linked) that would have to be accounted for in the new version's
catalog by pg_upgrade.

> That seems
> incompatible with your suggestion that it doesn't need to modify the data
> files.  Depending on how it goes about doing that, it could mean a
> significant short-term increase in storage requirements while the data is
> being converted.

No, that's not the case- link mode doesn't copy the data files, it just
rebuilds the catalog tables and fixes up things in the new database
cluster (clog, wal, et al, not the user data tables/indexes).

> Going back to our recent 'reindex database' attempt, pgsql does not
> necessarily do these things in the most storage-efficient manner; it seems
> entirely likely that it would choose to use links to duplicate the data
> directory, then create copies of each data file as it converts them over,
> then link that back to the original for an atomic replacement.  That could
> eat up a HUGE amount of storage during the conversion process without the
> start and end sizes being very different at all.

No, that isn't how pg_upgrade works.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Stephen Frost
Greetings,

* Derek Hans (derek.h...@gmail.com) wrote:
> Unfortunately only "alter function" supports "leakproof" - "alter operator"
> does not. Is there a function-equivalent for marking operators as
> leakproof? Is there any documentation for which operators/functions are
> leakproof?

Tom's query downthread provides the complete list.

Note that the list is not completely static- it's entirely possible that
additional functions can be made leak-proof, what's needed is a careful
review of the function code to ensure that it can't leak information
about the data (or, if it does today, a patch which removes that).  If
you have an interest in that then I'd encourage you to dig into the code
and look for possible leaks (Tom's already hinted in the direction you'd
want to go in) and then propose a patch to address those cases and to
mark the function(s) as leakproof.

> In my particular case, RLS is still useful even if operators are leaky as I
> control the application code and therefore can ensure leaky errors are
> handled. If it's possible to disable all checking for "leakproof", that
> would work for me.

There isn't a way to disable the leakproof-checking system.  Certainly
in the general case that wouldn't be acceptable and I'm not entirely
convinced by your argument that such an option should exist, though you
could go through and set all of the functions to be leakproof if you
really wish to.

> > If that's not possible, it sounds like it
> > > effectively blocks the use of GIN/GIST indexes when RLS is in use.
> >
> > There's a whole lot of daylight between "it doesn't pick an indexscan in
> > this one example" and "it effectively blocks the use of GIN/GIST".
> 
> True indeed :). Would you have a working example of using a GIN/GIST index
> with RLS? All the attempts I've made have ended in seq scans. In practice,
> I'm looking to implement fuzzy search using trigrams, so % and %> operators
> are what matter to me. ~~ also happens to fail. Should I expect to be able
> to use any of these with RLS, large amounts of data and reasonable
> performance?

Functions that aren't marked leakproof aren't going to be able to be
pushed down.

> Your description of leakproof (and the documentation I've found) makes it
> sound like I'm not just hitting an isolated problem, but a general problem
> with RLS that represents a substantial limitation and is likely worth
> documenting.

There's some documentation regarding leakproof functions here:

https://www.postgresql.org/docs/current/ddl-rowsecurity.html

and here:

https://www.postgresql.org/docs/11/sql-createfunction.html

Of course, patches are welcome to improve on our documentation.

One thing that it sounds like you're not quite appreciating is that in
the general case, verifying that a function is leakproof isn't optional.
Without such a check, any user could create a function and then get PG
to push that function down below the RLS checks and therefore gain
access to the data that they aren't supposed to be able to see.

All that said, there's quite a few functions that *are* marked as
leakproof already and they're quite handy and work well with RLS
already, as I expect you'll see when you go querying pg_proc.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: GIST/GIN index not used with Row Level Security

2019-08-13 Thread Stephen Frost
Greetings,

Please don't top-post on these lists.

* Derek Hans (derek.h...@gmail.com) wrote:
> Thanks for the detailed response, super helpful in understanding what's
> happening, in particular understanding the risk of not marking functions as
> leakproof. I'll take a look at the underlying code to understand what's
> involved in getting a function to be leakproof.

Great.

> That said, it does seem like it should be possible and reasonable to
> specify that a user should have access to the table stats so that the query
> planner works as expected. Maybe it comes down to the fact that RLS is
> still a work in progress, and I shouldn't be relying on it unless I'm
> really certain it supports the functionality I need.

PostgreSQL is still very much a work in progress. :)

> I've updated word_similarity_op(text,text) to be leakproof, and
> pg_proc agrees it is. I'm assuming word_similarity_op() is equivalent to
> <%, though I haven't found explicit confirmation. However, using
> word_similarity() instead of <% on a 100k row table, without any RLS
> involved, doesn't make use of the index, while using <% does. Obviously,
> adding the RLS doesn't make that any better. Any idea what might be the
> cause?

Just to be clear, you should be looking at pg_operator (oprcode) to
determine the function that is under the operator that you wish to
change to being leakproof.

Note that the selectivity functions are associated with the operator,
not the function itself.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: postmaster utilization

2019-08-19 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 8/19/19 5:40 AM, Shiwangini Shishulkar wrote:
> >We have scheduled postgres full backup on centos 7 machine. DB size is
> >around 66 GB. We observed while backup is running, postmaster CPU %
> >reaches to 90 - 100%,which results very strange behavior: that we are not
> >able to perform any DDL and DML on to the database during the time.
> >Please suggest a way to limit it's CPU utilization or else let us know how
> >we can access the database during the backup window. Please let us know on
> >priority since this is our prod database.
> 
> Are you compressing the backup?
> How many threads are you running?
> How many CPUs on your system?
> What version of Postgres?

... What are you using to run the backup?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: postmaster utilization

2019-08-20 Thread Stephen Frost
Greetings,

* Kyotaro Horiguchi (horikyota@gmail.com) wrote:
> At Mon, 19 Aug 2019 10:07:30 -0400, Stephen Frost  wrote 
> in <20190819140730.gh16...@tamriel.snowman.net>
> > * Ron (ronljohnso...@gmail.com) wrote:
> > > On 8/19/19 5:40 AM, Shiwangini Shishulkar wrote:
> > > >We have scheduled postgres full backup on centos 7 machine. DB size is
> > > >around 66 GB. We observed while backup is running, postmaster CPU %
> > > >reaches to 90 - 100%,which results very strange behavior: that we are not
> > > >able to perform any DDL and DML on to the database during the time.
> > > >Please suggest a way to limit it's CPU utilization or else let us know 
> > > >how
> > > >we can access the database during the backup window. Please let us know 
> > > >on
> > > >priority since this is our prod database.
> > > 
> > > Are you compressing the backup?
> > > How many threads are you running?
> > > How many CPUs on your system?
> > > What version of Postgres?
> > 
> > ... What are you using to run the backup?
> 
> It seems to have been reported as BUG #15961..
> 
> https://www.postgresql.org/message-id/15965-413bf5d18aaef...@postgresql.org
> 
> > PostgreSQL version: 11.4
> > Operating system:   CentOS Linux release 7.6.1810 (Core)
> > pg_dump -U postgres -d wg -f wg.sql 

Ah, then the pg_dump is clearly what's locking the tables against DDL.
That wouldn't impact DML though (unless, of course, a DDL was issued and
then blocked behind pg_dump, and then DML was attempted on the same
table as that would then be blocked behind the DDL).

There's not much help for that when you're taking logical backups,
unfortunately.  Of course, you could take physical backups instead.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Retroactively adding send and recv functions to a type?

2019-08-20 Thread Stephen Frost
Greetings,

* Vik Fearing (vik.fear...@2ndquadrant.com) wrote:
> On 19/08/2019 19:32, Tom Lane wrote:
> > "Johann 'Myrkraverk' Oskarsson"  writes:
> >> I meant ALTER TYPE.  Adding the send and recv functions doesn't seem
> >> to supported by ALTER TYPE.
> >> Is there a workaround for this?
> > You could manually update the pg_type row, and then if you were
> > being fussy, add pg_depend entries showing the type depends on
> > the functions.
> 
> We generally discourage updating the catalogs directly.  This was why I
> wrote the CREATE/ALTER DATABASE patch in 2014 that you helped me with
> (fbb1d7d73f8).
> 
> I'm willing to work on a patch for ALTER TYPE if it has a chance of
> being accepted.

Seems pretty clear that it'd be a useful thing to have, so +1 from me,
at least.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_xlog on slaves has grown to 200GB

2019-08-20 Thread Stephen Frost
Greetings,

* Vikas Sharma (shavi...@gmail.com) wrote:
> We are using postgresql 9.5 with repmgr 3.3.2 in streaming replication
> setup with 1 master and 2 slaves. I have noticed that the pg_xlog on slaves
> has grown to 200GB and is still growing.
> 
> Please advise why pg_xlog is growing and not pruning itself, is there any
> parameter I need to setup to accomplish this? or repmgr will control it
> itself.
> 
> I am not 100% sure but feel it pg_xlog never used to grow this much or was
> clearing itself, I can see there are WALs since May 2019 but not before
> that. I want to understand why the WALs started accumulating since then. we
> have this setup since more than a year.

Check your logs.  Seems most likely that something broke archiving,
though it may also be simply that you have an old replication slot
that's holding back the removal of the WAL.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: SSPI auth and mixed case usernames

2019-08-30 Thread Stephen Frost
Greetings,

* Niels Jespersen (n...@dst.dk) wrote:
>Hello Magnus
>Thank you for your prompt reply. 
>I’m not sure I understand your last statement. I want to achieve that
>regardless of the case of the entered username is logged into the same
>Postgres user (whose name is created in all lowercase).
>In other words, Windows usernames one day entered as XYz, the next day
>entered as xYz, should logon to Postgres user xyz.

You just have to make sure that the users tell whatever program they're
using to connect to PG (like psql, pgAdmin, whatever) that their PG
username is 'xyz'.  If they try to log in as 'XYZ' then that's gonna be
a different PG user.

If you have some other application that's being used to connect to PG
then you could do the lowercase in the app...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgbackrest restore to new location?

2019-09-17 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 9/17/19 6:48 AM, David Steele wrote:
> >On 9/17/19 7:23 AM, Luca Ferrari wrote:
> >>On Tue, Sep 17, 2019 at 12:00 PM Ron  wrote:
> >>>The real problem is that after doing that, "pg_ctl start -D
> >>>/path/to/new/data" fails with "PANIC: could not locate a valid checkpoint
> >>>record".
> >Sounds like backup_label is missing or has been deleted.
> >
> >The easiest way to restore to a new location is just to copy
> >pgbackrest.conf from the primary (or create a new one) with the same
> >stanza and then alter pg1-path in pgbackrest.conf or at the command line.
> 
> That's what I did.  (Also, I opened Issue #839 in GitHub  All of the log
> files are attached there.)

Per the discussion in that issue, it looks like there's some issue with
the restore command failing to be able to pull the needed WAL from the
repo.

That said- it brings up a pretty serious issue that should be discussed,
and that's nuking this:

HINT:  If you are not restoring from a backup, try removing the file 
".../backup_label".

That hint is absolutely wrong these days when many tools have been
updated to use the non-exclusive backup method and it just ends up
getting people into trouble and, worse, can result in them having
corrupted clusters.

I'll get a patch into the next commitfest to remove it.  The exclusive
method has been deprecated for quite a few releases and we should stop
giving bad advice on the assumption that people are using it.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgbackrest restore to new location?

2019-09-18 Thread Stephen Frost
Greetings,

* David Steele (da...@pgmasters.net) wrote:
> On 9/17/19 10:03 PM, Stephen Frost wrote:
> > I'll get a patch into the next commitfest to remove it.  The exclusive
> > method has been deprecated for quite a few releases and we should stop
> > giving bad advice on the assumption that people are using it.
> 
> We updated the error message and hints in c900c152, but it was not
> back-patched.

Ohh, right, good. :)

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: pgbackrest - question about restoring cluster to a new cluster on same server

2019-09-19 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> On 9/18/19 8:58 PM, David Steele wrote:
> >On 9/18/19 9:40 PM, Ron wrote:
> >>I'm concerned with one pgbackrest process stepping over another one and
> >>the restore (or the "pg_ctl start" recovery phase) accidentally
> >>corrupting the production database by writing WAL files to the original
> >>cluster.
> >This is not an issue unless you seriously game the system.  When a
> >cluster is promoted it selects a new timeline and all WAL will be
> >archived to the repo on that new timeline.  It's possible to promote a
> >cluster without a timeline switch by tricking it but this is obviously a
> >bad idea.
> 
> What's a timeline switchover?

Put simply, it's a branch off of the current WAL stream on to a new WAL
stream and it happens whenever there's a promotion.

Forgive the ASCII art, but-

A---> timeline 1, where things start
 \-> a promotion happened at time A, new WAL is on timeline 2

Consider an async replication scenario, where the network on the primary
is lost but it keeps writing out WAL and accepting new commits, but at a
time "A" we give up on it and promote the replica, so the replica
switches to timeline 2 and starts accepting writes.  Now we are in a
situation where two systems are generating WAL (the network partitioned
old primary, and the replica-now-primary).  Having the promotion switch
to a timeline makes it clear where that promotion happened and where the
replica-now-primary's WAL stream started.

This is actually what pg_rewind is based around too- to re-master the
old primary, it'll find that split point A and "rewind" (well, not
really, because it just grabs the pages, but whatever) the old primary
back to A and then the old primary can follow the new primary on
timeline 2.

> >So, if you promote the new cluster and forget to disable archive_command
> >there will be no conflict because the clusters will be generating WAL on
> >separate timelines.
> 
> No cluster promotion even contemplated.

Ah, but you are talking about a cluster promotion, though you don't
realize it.  Any time there is a "at some point, I was to stop replaying
WAL and start accepting new changes", there's a timeline switch and
notionally a promotion.

> The point of the exercise would be to create an older copy of the cluster --
> while the production cluster is still running, while production jobs are
> still pumping data into the production database -- from before the time of
> the data loss, and query it in an attempt to recover the records which were
> deleted.

Sure, that's all entirely possible and shouldn't be an issue.  When you
go through the restore process and specify a point where you want the
restore to stop, so that you can connect and pull the down the table,
when PG reaches that point it'll promote and do a timeline switch.

Now, if you don't actually want that restore to promote and come up as a
system that you can write to, you could instead say 'pause', and then
connect to the database and grab whatever data you needed.  That should
also avoid the concern around archive command, provided you never
actually let that system finish recovery and instead just shut it down
while it's still read-only.

If you want to play around with this stuff and see what happens with a
promote, or try doing a pause instead, you might be interested in:

https://learn.crunchydata.com/

and specifically the pgbackrest one:

https://learn.crunchydata.com/pg-administration/courses/basic-postgresql-for-dbas/pgbackrest/

Basically, it's kind of like a blog post where you can play around on a
scratch system that's built into the page and click through the steps to
see what happens, and change things around if you want.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pgbackrest - question about restoring cluster to a new cluster on same server

2019-09-19 Thread Stephen Frost
Greetings,

* Ron (ronljohnso...@gmail.com) wrote:
> I've been a DBA for 20+ years, and restored a **lot** of **copies** of
> production databases.  PostgreSQL has some seriously different concepts.
> With every other system, it's: restore full backup to new location, restore
> differential backup, apply some roll-forward logs and you're done.  No
> pausing, promoting, etc.

Yup, I agree entirely, PostgreSQL is different.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_receivexlog or archive_command

2019-10-01 Thread Stephen Frost
Greetings,

* Vikas Sharma (shavi...@gmail.com) wrote:
> I am wondering which one is the best way to archive the xlogs for Backup
> and Recovery - pg_receivexlog or archive_command.
> 
> pg_receivexlog seems best suited because the copied/archived file is
> streamed as it is being written to in xlog while archive_command only
> copies when the WAL is fully written to.

This really ends up depending on what your WAL volume is.  As mentioned,
you really don't want to write your own utility for archive_command,
since you really want to make sure that the WAL has actually been
archived and sync'd- so definitely use an existing tool which does that
for you.  The downside of pg_receivewal when it comes to high WAL volume
is that a single-threaded process just simply can't keep up, while
something being called from archive_command can be parallelized.
Perhaps, down the road, there'll be a tool that could parallelize
streaming of WAL also, though it would be pretty tricky to get right,
and if you're doing that much WAL, is it really an issue that it's
already chunked up nicely for archive_command..?

Note that there is also archive_timeout which you can set, to make sure
that you don't go too long with writes on the primary that haven't been
sent to the archive and stored.  If your transations are particularly
valuable, then having a synchronous standby setup (likely with two
replicas in a quorum-based sync setup) is probably the direction to
go in, so you can have a highly available environment.  Anything that's
async will mean you have a good chance of having some data loss if
things go wrong (even with pg_receivewal..).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_receivexlog or archive_command

2019-10-01 Thread Stephen Frost
Greetings,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 2019-09-23 10:25, Vikas Sharma wrote:
> > I am wondering which one is the best way to archive the xlogs for Backup
> > and Recovery - pg_receivexlog or archive_command.
> 
> I recommend using pg_receivexlog.  It has two important advantages over
> archive_command:  1) You can have multiple instances of pg_receivexlog
> running and copying things to different places.  This is complicated to
> do correctly with archive_command.  2) pg_receivexlog will fsync the
> files it writes.  This is also complicated to do correctly with
> archive_command.

Yes, it definitely is difficult to write your own archive_command, in
general, so, please, just don't.  Use one of the existing tools that
have been well tested and written specifically to work with PG and to
provide the guarantees that an archive command should.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Users, Roles and Connection Pooling

2019-10-02 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> A couple of pointers:

I generally agree with these comments.

> - This is a good setup if you don't have too many users.  Metadata
>   queries will start getting slow if you get into the tens of thousands
>   of users, maybe earlier.

While this seems plausible- I'd love to hear about exactly what you've
seen start to be a problem when getting up to that many users.  Are you
just referring to things like \du?  Or..?

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Users, Roles and Connection Pooling

2019-10-02 Thread Stephen Frost
Greetings,

(we don't top-post on these lists, fyi, please reply in-line and trim)

* Matt Andrews (mattandr...@massey.com.au) wrote:
> I have little experience in this area, but it seems like having a Postgres
> role for every application user is the right way to do things. It’s just
> that it also seems really inconvenient.

I agree that there are some drawbacks to it.

> For example how to map an application’s users/people table to Postgres
> roles? The pg_role name field is limited to 64 bytes, you can’t create a
> foreign key to pg_role. What’s the answer? Use UUIDs as usernames or
> something?

Yeah, it would be nice to have an answer to the FK issue when it comes
to roles (and possibly other things..).  The limit on length is annoying
but I'm not sure that it's show-stopper.  I don't think using UUIDs is a
good idea, at all...

> There’s very little out there on this topic, but surely this has been done
> before.

Oh, absolutely, but with compromises, particularly around FKs and such.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Performance on JSONB select

2019-10-02 Thread Stephen Frost
Greetings,

* Michael Lewis (mle...@entrata.com) wrote:
> Much of indexing strategy depends on knowing the data like how many
> distinct values and what the distribution is like. Is JsonBField->>'status'
> always set? Are those three values mentioned in this query common or rare?
> Can you re-write this query to avoid using an OR in the where clause? Are
> you just wanting to add a GIN index for the jsonb paths? Or do you want
> indexed like below that are a bit stylized to this query?

If you know a field is going to always be there, you're better off, by
far, by just having a regular column for that value and a straight up
btree for it.  This saves a significant amount of space and makes it
much easier to index and work with.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

2019-10-02 Thread Stephen Frost
Greetings,

* Aleš Zelený (zeleny.a...@gmail.com) wrote:
> But recovery on replica failed to proceed WAL file
> 00010FED0039  with log message: " invalid contrecord length
> 1956 at FED/38FFE208".

Err- you've drawn the wrong conclusion from that message (and you're
certainly not alone- it's a terrible message and we should really have a
HINT there or something).  That's an INFO-level message, not an error,
and basically just means "oh, look, there's an invalid WAL record, guess
we got to the end of the WAL available from this source."  If you had
had primary_conninfo configured in your recovery.conf, PG would likely
have connected to the primary and started replication.  One other point
is that if you actually did a promotion in this process somewhere, then
you might want to set recovery_target_timeline=latest, to make sure the
replica follows along on the timeline switch that happens when a
promotion happens.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

2019-10-02 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Aleš Zelený (zeleny.a...@gmail.com) wrote:
> >> But recovery on replica failed to proceed WAL file
> >> 00010FED0039  with log message: " invalid contrecord length
> >> 1956 at FED/38FFE208".
> 
> > Err- you've drawn the wrong conclusion from that message (and you're
> > certainly not alone- it's a terrible message and we should really have a
> > HINT there or something).
> 
> Yeah, those messages are all pretty ancient, from when WAL was new and not
> to be trusted much.  Perhaps the thing to do is move the existing info
> into DETAIL and make the primary message be something like "reached
> apparent end of WAL stream".

Yes, +1 on that.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: GSSAPI: logging principal

2019-10-09 Thread Stephen Frost
Greetings,

* Allan Jensen (pgl...@winge-jensen.dk) wrote:
> I have GSSAPI-login and user mapping to postgres working fine.

Great!

> Whenever i login to postgres I get a line like the following in the
> logfile:
> 
> connection authorized: user=testrole database=testdb SSL enabled
> (protocol=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384, bits=256,
> compression=off)

Minor side-note: we have GSSAPI encryption built-in starting with v12,
removing the need to also have SSL.

> What bothers me, is that i can't see what kerberos prinicpal was used
> during authentication.

Yeah, I'm afraid that's probably right.  In looking, I don't see any
particularly easy way.  I could have sworn I complained about this ages
ago (there might even be a patch somewhere in the depths of -hackers
from 5 years ago or more) but clearly it never made it in.

One thing that is kind of nice is that with v12 there's a new view where
you can view the state of existing connections, including the principal
they authenticate with: pg_stat_gssapi.

> Is there any way to make postgres log the principal?

Would definitely be a good thing for us to have, and the CN for an
SSL-based connection.  I don't think it'd be hard for someone to hack up
a patch to do so.  I've added it to my list of "nice to haves" but it
seems unlikely I'll get any time in the near future to hack on it, so
if someone else wants to work on it, please feel free to do so...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Is my lecturer wrong about PostgreSQL? I think he is!

2019-10-09 Thread Stephen Frost
Greetings,

First off- please try to craft a new email in the future rather than
respond to an existing one.  You may not realize this but there's some
headers that get copied when you do a reply that cause the email to show
up as being a reply, even if you remove all the "obvious" bits from it.

* Pól Ua Laoínecháin (lineh...@tcd.ie) wrote:
> 1) Is my lecturer full of it or does he really have a point?

He's full of it, as far as I can tell anyway, based on what you've
shared with us.  Just look at the committers and the commit history to
PostgreSQL, and look at who the largest contributors are and who they
work for.  That alone might be enough to surprise your lecturer with.

> 2) The actual concrete acknowledged problem with fsync that affected
> PostgreSQL - why didn't it affect Oracle? Or MySQL? Or did it but it
> was so rare that it never became apparent - it wasn't that obvious
> with PostgreSQL either - one of those rare and intermittent problems?

Databases that do direct I/O don't depend on fsync.  That said, I do
think this could have been an issue for Oracle if you ran it without
direct i/o.

> 3) Were there ever any problems with BSD?

As I understand it, no.

> 4) What is the OS of choice for *_serious_* PostgreSQL installations?

BSD and Linux are both quite popular platforms for running PG, and
people run very serious workloads on both.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Is my lecturer wrong about PostgreSQL? I think he is!

2019-10-10 Thread Stephen Frost
Greetings,

* Pól Ua Laoínecháin (lineh...@tcd.ie) wrote:
> > > 1) Is my lecturer full of it or does he really have a point?
> 
> > He's full of it, as far as I can tell anyway, based on what you've
> > shared with us.  Just look at the committers and the commit history to
> > PostgreSQL, and look at who the largest contributors are and who they
> > work for.  That alone might be enough to surprise your lecturer with.
> 
> The only non-PostgreSQL company that I could find was Fujitisu - where
> can I find a (list of) the others?

Not sure where you were looking...  The contributors list is here:

https://www.postgresql.org/community/contributors/

The committers list is here:

https://wiki.postgresql.org/wiki/Committers

The git tree is here:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=summary

Perhaps not the best stat, but you can view the contributions by
committer pretty easily, for 2018, here:

https://github.com/postgres/postgres/graphs/contributors?from=2018-01-01&to=2018-12-31&type=c

Note that this isn't very representative of the actual authors though-
we don't track those in the way git would prefer, instead we note who
the author of a given patch was in the commit message itself.

> > Databases that do direct I/O don't depend on fsync.  That said, I do
> > think this could have been an issue for Oracle if you ran it without
> > direct i/o.
> 
> I think that Oracle are big into asyncio? I know that you have to sudo
> dnf install some_library with a name like asio/asyncio or something
> like that?

Oracle supports both, but running with direct i/o is pretty popular,
yes.

> Anyway, why doesn't PostgreSQL use Direct I/O?

There's an awful lot that the kernel provides when it comes to things
like good read-ahead and dealing with disks and SSDs and such that we
(currently, at least) prefer to leverage instead of writing lots of new
code to deal with that ourselves, which would be required to use Direct
I/O (and not have it be completely terrible performance wise, anyway).

The whole issue behind fsync was because our expectation (and POSIX's,
if you ask me anyway) was different from what the Linux kernel was
providing (specifically, you could end up in a situation where an
fsync() call "worked" and didn't return an error, even though there
remained pages that were dirty and not written out).  Now, this is under
other error conditions typically and you'll get messages in the kernel
log about such failures usually, so if you're properly monitoring and
managing your systems there's a good chance you would have realized
there was a problem even though the Linux kernel was telling PG that
everything was fine (have backups!!).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Minimum privilege for Backup and replication

2019-10-10 Thread Stephen Frost
Greetings,

* Timmy Siu (timmy@aol.com) wrote:
> Now, my question is -
> What is the Minimum Privilege of a pgsql Backup or Replication user?

To perform a file-level backup of PostgreSQL, your OS user will need
read access to all of the files in the data directory (you can use group
privileges for that as of v11, as I recall), and then it'll need to be
able to connect to PG as a user and be able to execute the
pg_start_backup() and pg_stop_backup() functions (you can grant that
access by issuing GRANT commands).  Depending on your file-level backup
solution (please do NOT roll your own...), you might also need to grant
access to functions like pg_switch_wal() too.  Check the documentation
for the backup solution.

We do not (yet, anyway..  maybe in v13) have a way to GRANT read access
to all tables in a database to be able to perform a database-wide export
of the dat (using pg_dump).

For replication, you need to create a user with the 'replication' role
attribute and configure pg_hba.conf to allow the client to connect to
the replication database.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Pgbackrest backup is too slow

2019-10-11 Thread Stephen Frost
Greetings,

* Ajay Pratap (ajaypra...@ameyo.com) wrote:
> I have a Centos 7 server which runs Postgresql 10.7. I am using pgbackrest
> to take db backup.
> Problem is backup is too slow.

Have you tried running 'top' to see what's going on?

> My data dir size is 9.6G and full backup runtime is 22 mins
> I also tried using process-max=3, full backup runtime = 21 mins

Erm, those numbers don't make any sense to me- as an example, we
regularly back up a 12GB database, from Dallas to San Fran, in 5
minutes.

I see you're requesting a checkpoint to start immediately, but how long
does the checkpoint actually take to complete?  Have you looked at your
PG logs..?

Have you tried running 'top' and then running the backup and looked to
see what's going on?  Is the CPU at 100%?  Are all the processes stuck
in 'D' state all the time (meaning you don't have enough i/o resources),
or...?

> In production we have a database of size more than 500GB. It would not be
> feasible for me to use it as it is.pgbackrest version is
> 2.18-1.rhel7.x86_64.
> installed from pgdg yum repo.

Given sufficient resources, we've seen pgbackrest push over 1TB/hour, so
there is definitely something strange going on here.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: jsonb_set() strictness considered harmful to data

2019-10-18 Thread Stephen Frost
Greetings,

* Ariadne Conill (aria...@dereferenced.org) wrote:
> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver  
> wrote:
> > https://www.postgresql.org/docs/11/functions-json.html
> > " The field/element/path extraction operators return NULL, rather than
> > failing, if the JSON input does not have the right structure to match
> > the request; for example if no such element exists"
> 
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.
> 
> > Just trying to figure why one is worse then the other.
> 
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.

Your contention that the documented, clear, and easily addressed
behavior of a particular strict function equates to "the database system
loses data and isn't durable" is really hurting your arguments here, not
helping it.

The argument about how it's unintuitive and can cause application
developers to misuse the function (which is clearly an application bug,
but perhaps an understandable one if the function interface isn't
intuitive or is confusing) is a reasonable one and might be convincing
enough to result in a change here.

I'd suggest sticking to the latter argument when making this case.

> > > I believe that anything that can be catastrophically broken by users
> > > not following upgrade instructions precisely is a serious problem, and
> > > can lead to serious problems.  I am sure that this is not the only
> > > project using JSONB which have had users destroy their own data in
> > > such a completely preventable fashion.

Let's be clear here that the issue with the upgrade instructions was
that the user didn't follow your *application's* upgrade instructions,
and your later code wasn't written to use the function, as documented,
properly- this isn't a case of PG destroying your data.  It's fine to
contend that the interface sucks and that we should change it, but the
argument that PG is eating data because the application sent a query to
the database telling it, based on our documentation, to eat the data,
isn't appropriate.  Again, let's have a reasonable discussion here about
if it makes sense to make a change here because the interface isn't
intuitive and doesn't match what other systems do (I'm guessing it isn't
in the SQL standard either, so we unfortunately can't look to that for
help; though I'd hardly be surprised if they supported what PG does
today anyway).

As a practical response to the issue you've raised- have you considered
using a trigger to check the validity of the new jsonb?  Or, maybe, just
made the jsonb column not nullable?  With a trigger you could disallow
non-null->null transistions, for example, or if it just shouldn't ever
be null then making the column 'not null' would suffice.

I'll echo Christoph's comments up thread too, though in my own language-
these are risks you've explicitly accepted by using JSONB and writing
your own validation and checks (or, not, apparently) rather than using
what the database system provides.  That doesn't mean I'm against
making the change you suggest, but it certainly should become a lesson
to anyone who is considering using primairly jsonb for their storage
that it's risky to do so, because you're removing the database system's
knowledge and understanding of the data, and further you tend to end up
not having the necessary constraints in place to ensure that the data
doesn't end up being garbage- thus letting your application destroy all
the data easily due to an application bug.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: jsonb_set() strictness considered harmful to data

2019-10-18 Thread Stephen Frost
Greetings,

* Ariadne Conill (aria...@dereferenced.org) wrote:
> On Fri, Oct 18, 2019 at 5:57 PM Christoph Moench-Tegeder
>  wrote:
> > ## Ariadne Conill (aria...@dereferenced.org):
> > > Why don't we fix the database engine to not eat data when the
> > > jsonb_set() operation fails?
> >
> > It didn't fail, it worked like SQL (you've been doing SQL for too
> > long when you get used to the NULL propagation, but that's still
> > what SQL does - check "+" for example).
> > And changing a function will cause fun for everyone who relies on
> > the current behaviour - so at least it shouldn't be done on a whim
> > (some might argue that a whim was what got us into this situation
> > in the first place).
> 
> NULL propagation makes sense in the context of traditional SQL.  What
> users expect from the JSONB support is for it to behave as JSON
> manipulation behaves everywhere else.  It makes sense that 2 + NULL
> returns NULL -- it's easily understood as a type mismatch.  It does
> not make sense that jsonb_set('{}'::jsonb, '{foo}', NULL) returns NULL
> because a *value* was SQL NULL.  In this case, it should, at the
> least, automatically coalesce to 'null'::jsonb.

2 + NULL isn't a type mismatch, just to be clear, it's "2 + unknown =
unknown", which is pretty reasonable, if you accept the general notion
of what NULL is to begin with.

And as such, what follows with "set this blob of stuff to include this
unknown thing ... implies ... we don't know what the result of the set
is and therefore it's unknown" isn't entirely unreasonable, but I can
agree that in this specific case, because what we're dealing with
regarding JSONB is a complex data structure, not an individual value,
that it's surprising to a developer and there can be an argument made
there that we should consider changing it.

> > Continuing along that thought, I'd even argue that your are
> > writing code which relies on properties of the data which you never
> > guaranteed. There is a use case for data types and constraints.
> 
> There is a use case, but this frequently comes up as a question people
> ask.  At some point, you have to start pondering whether the behaviour
> does not make logical sense in the context that people frame the JSONB
> type and it's associated manipulation functions.  It is not *obvious*
> that jsonb_set() will trash your data, but that is what it is capable
> of doing.  In a database that is advertised as being durable and not
> trashing data, even.

Having the result of a call to a strict function be NULL isn't
"trashing" your data.

> > Not that I'm arguing for maximum surprise in programming, but
> > I'm a little puzzled when people eschew thew built-in tools and
> > start implmenting them again side-to-side with what's already
> > there.
> 
> If you read the safe_jsonb_set() function, all we do is coalesce any
> SQL NULL to 'null'::jsonb, which is what it should be doing anyway,

I'm not convinced that this is at all the right answer, particularly
since we don't do that generally.  We don't return the string 'null'
when you do: NULL || 'abc', we return NULL.  There might be something we
can do here that doesn't result in the whole jsonb document becoming
NULL though.

> and then additionally handling any *unanticipated* failure case on top
> of that.  While you are arguing that we should use tools to work
> around unanticipated effects (that are not even documented -- in no
> place in the jsonb_set() documentation does it say "if you pass SQL
> NULL to this function as a value, it will return SQL NULL"), I am
> arguing that jsonb_set() shouldn't set people up for their data to be
> trashed in the first place.

The function is marked as strict, and the meaning of that is quite
clearly defined in the documentation.  I'm not against including a
comment regarding this in the documentation, to be clear, though I
seriously doubt it would actually have changed anything in this case.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: jsonb_set() strictness considered harmful to data

2019-10-19 Thread Stephen Frost
Greetings,

* Dmitry Dolgov (9erthali...@gmail.com) wrote:
> If we want to change it, the question is where to stop? Essentially we have:
> 
> update table set data = some_func(data, some_args_with_null);
> 
> where some_func happened to be jsonb_set, but could be any strict function.

I don't think it makes any sense to try and extrapolate this out to
other strict functions.  Functions should be strict when it makes sense
for them to be- in this case, it sounds like it doesn't really make
sense for jsonb_set to be strict, and that's where we stop it.

> I wonder if in this case it makes sense to think about an alternative? For
> example, there is generic type subscripting patch, that allows to update a
> jsonb in the following way:
> 
> update table set jsonb_data[key] = 'value';
> 
> It doesn't look like a function, so it's not a big deal if it will handle NULL
> values differently. And at the same time one can argue, that people, who are
> not aware about this caveat with jsonb_set and NULL values, will most likely
> use it due to a bit simpler syntax (more similar to some popular programming
> languages).

This seems like an entirely independent thing ...

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_hba & ldap

2019-10-22 Thread Stephen Frost
Greetings,

* Diego (mrstephenam...@gmail.com) wrote:
> I have a problem with ldap authentication, I have a ldap string like this:
> 
> host all all 0.0.0.0/0 ldap ldapserver="10.20.90.251
> 10.20.90.252 10.10.90.251 10.10.90.252" ldapport=389...
> 
> It is correct? if the firs server is down, pg will go to the next one to
> continue authenticating?

Yes, that looks like it should work- is it not?

> It's a pg11 and ldap is an ipa server

Note that with an IPA setup, similar to if you were running Active
Directory, you have Kerberos and a KDC available, which is a much better
authentication mechanism that removes the need for the database sever to
reach out to another system to handle the authentication, and avoids
having the user's password sent to the database server.  You might want
to consider using that (which is called 'gssapi' in PostgreSQL, which is
basically generalized Kerberos) instead of LDAP.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_basebackup + incremental base backups

2019-12-03 Thread Stephen Frost
Greetings,

* Christopher Pereira (krip...@imatronix.cl) wrote:
> Our stream replication slave server got out of sync so we need to base
> backup again.

If you do WAL archiving instead of depending on the WAL to exist on the
primary then a replica can catch up using WAL.  Having a WAL archive
also means you can do point-in-time-recovery for any point.

> In case of big databases, can we do incremental backups with pg_basebackup?

pg_basebackup doesn't support incremental backups, though there's been
discussion of adding some kind of support for it, check -hackers if
you're curious.

> Is there any alternative?

There's a few different alternatives for PG backup software, some of
which support incremental backups and restores.  I'm personally involved
with pgbackrest- https://pgbackrest.org though there's other options out
there also (wal-g, barman, and more).

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: upgrade and migrate

2019-12-04 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> On Wed, 2019-12-04 at 13:48 +0900, Michael Paquier wrote:
> > On Tue, Dec 03, 2019 at 10:32:22PM +, Julie Nishimura wrote:
> > > Hello, what is the best way to migrate from PostgreSQL 8.3.11 on
> > > x86_64-redhat-linux-gnu to PostgreSQL 9.6.16 on x86_64-pc-linux-gnu
> > > server, with minimal downtime?
> > > The caveat is the source has about 80 databases overall almost 30
> > > TB. I could migrate the smallest ones (up to 1 tb) using pg_dump and
> > > pg_restore, but the largest hot database is almost 17 tb, and I am
> > > not sure how to approach this effort in a better and efficient way?
> > 
> > pg_upgrade could be one way to go here.  That's not the scale pg_dump
> > would be very good at.  I would have personally avoided using pg_dump
> > above 10~20GB.  Depending on the downtime you are ready to accept,
> > a migration based on Slony could be something to investigate.
> 
> Right, Slony is the way to go, since pg_upgrade doesn't support 8.3.
> 
> I would upgrade to a version more recent than 9.6.

So...  there's a bit of history here.  pg_upgrade in 9.4 actually does
support upgrading from 8.3.X.  Support for upgrading from 8.3 was
removed in 2209b3923a7afe0b6033ecfea972219df252ca8e.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: secure deletion of archived logs

2019-12-04 Thread Stephen Frost
Greetings,

* Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> with Oracle we use "backup archivelog all delete all input".
> this is a kind of atomic transaction.
> everything backuped for sure is deleted.
> 
> with Postgres we archive to a local host directory

... how?  Do you actually sync the files after you copy them with an
fsync to be sure that they're durably stored there?  If not, then
there's a pretty good chance that you'll lose some WAL if a crash
happens because if your archive command returns successful, PG will
removed its copy of the WAL file.

Hint: using 'cp' as an archive command is a very bad idea.

> we do a Networker backup of this directory afterwards and delete the archived 
> logs
> but this is not an atomic transaction
> so there is a small risk that something gets deleted which is not backuped

That would definitely be quite bad, particularly if a WAL file that was
needed for a backup to be consistent was removed or missed, as that
backup would no longer be valid then.

> how to you prevent this?

I would strongly recommend that you use a tool that's actually built for
the purpose of backing up PG systems, like pgbackrest or similar.
Writing your own custom code for managing WAL archives and backup sets
is likely to result in issues.

> Is there any backup tool which can do backups analogous Oracle?

There's quite a few different tools available for backing up PG systems,
with various features and performance- from simple things like
pg_basebackup (which you can set up to include all the WAL for the
backup to be consistent, though that doesn't do anything to help you
with managing WAL for PITR), to much more sophisticated tools like
pgbackrest, wal-g, and others that help with managing WAL and dealing
with expiring out backups and such.  The biggest thing is- don't try to
roll your own.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: archiving question

2019-12-04 Thread Stephen Frost
Greetings,

* Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> When there is a Postgres archiver stuck because of filled pg_xlog and archive 
> directories...
> 
> ... and the pg_xlog directory had been filled with dozens of GBs of xlogs...
> 
> ...it takes ages until the archive_command had moved all xlogs from the 
> pg_xlog directory to the archive directory afterwards...
> 
> ... and you get crazy if you have a 8GB archive directory while the pg_xlog 
> directory had been pumped up to 100GB :(
> 
> 
> Any idea on this one?

Parallelizing the archive-push operation can be quite helpful to address
this.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: AW: secure deletion of archived logs

2019-12-09 Thread Stephen Frost
Greetings,

* Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> We use "rsync" on XFS with "wsync" mount mode. I think this should do the job?

No, that just makes sure that namespace operations are executed
synchronously, that doesn't provide any guarantee that the data has
actually been written out and sync'd.

> The tools mentioned will all do backup to disk.
> We are required to do backup to tape.

Back up to disk first and then tar to tape.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: AW: AW: secure deletion of archived logs

2019-12-11 Thread Stephen Frost
Greetings,

* Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> > * Zwettler Markus (OIZ) (markus.zwett...@zuerich.ch) wrote:
> > > We use "rsync" on XFS with "wsync" mount mode. I think this should do the 
> > > job?
> > 
> > No, that just makes sure that namespace operations are executed 
> > synchronously,
> > that doesn't provide any guarantee that the data has actually been written 
> > out and
> > sync'd.
> 
> What else? "rsync" on XFS mounted wsync + execute "sync" afterwards?

I don't really agree with it, but pitrery uses 'dd' with 'conv=fsync'.
As far as I know, there isn't a way to force 'rsync' to run an fsync()
at the end, and executing a 'sync' afterwards, while it should work,
seems likely to cause you a lot more troubles..

In the end, I wouldn't suggest trying to hack up your own scripts to do
any of this- use one of the existing tools.  It's a lot more complicated
than I think you're appreciating.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: READ UNCOMMITTED in postgres

2019-12-18 Thread Stephen Frost
Greetings,

* Matthew Phillips (mphillip...@gmail.com) wrote:
> With the current READ UNCOMMITTED discussion happening on pgsql-hackers
> [1], It did raise a question/use-case I recently encountered and could not
> find a satisfactory solution for. If someone is attempting to poll for new
> records on a high insert volume table that has a monotonically increasing
> id, what is the best way to do it? As is, with a nave implementation, rows
> are not guaranteed to appear in monotonic order; so if you were to keep a
> $MAX_ID, and SELECT WHERE p_id > $MAX_ID, you would hit gaps. Is there a
> clean way to do this? I've seen READ UNCOMMITTED used for this with DB2.

There's the LISTEN/NOTIFY system, which at a high level is a better
approach than using a polling system.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: How can I pushdown of functions used in targetlist with FDW ?

2020-01-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> shigeo Hirose  writes:
> > How can I pushdown of functions used in targetlist with FDW ?
> 
> There is, AFAIK, no provision for that.  There's not a lot of
> reason to consider adding it either, because there's no reason
> to suppose that the remote node can run such a function any
> faster than the local node.  So the difficulty and risk of
> determining/assuming that f(x) on the local node is the same
> as f(x) on the remote node doesn't seem like it'd be repaid.

Well, for starters, isn't there something in the SQL/MED spec about
this..?

Next, we already do this for aggregates.

Also, I don't think I agree with this particular position- it's entirely
possible that the remote server is much less loaded/busy than the local
one and therefore it'd be beneficial, overall, to run that function on
the remote system and not the local one.  The function might even have
side-effects or use other objects in the system to run.  The function
may also return a much smaller result than pulling back the raw data (of
course, the opposite could also possibly be true).

I guess my gut feeling is that, in general, we should push down as much
of the query as possible, as we do for aggregates and joins and
conditionals.  I appreciate that there's some complications here when it
comes to figuring out if it's possible/reasonable to push down something
like, say, a volatile function that's locally defined and known to exist
on the remote.  Haven't got a particular idea how to address that
offhand but that doesn't change my feelings that we should have a way to
do this generally and that it would actually be useful to have.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: How can I pushdown of functions used in targetlist with FDW ?

2020-01-09 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> There is, AFAIK, no provision for that.  There's not a lot of
> >> reason to consider adding it either, because there's no reason
> >> to suppose that the remote node can run such a function any
> >> faster than the local node.
> 
> > Also, I don't think I agree with this particular position- it's entirely
> > possible that the remote server is much less loaded/busy than the local
> > one and therefore it'd be beneficial, overall, to run that function on
> > the remote system and not the local one.
> 
> Or the reverse.  We have no way of estimating such effects.

Yeah, it'd be nice if there was an overall solution, but I don't know
that we have to solve that to support this.

> > The function might even have
> > side-effects or use other objects in the system to run.
> 
> In such a case, we *can't* be pushing down.  The entire foundational
> principle of this behavior is that we must be certain that a pushed-down
> operation has exactly the same semantics on either node.  Otherwise,
> the planner's choices are not optimizations but query behavior changes,
> and we can't have that.  If you've got a function like that to run
> on the remote end, you have to embed it in a remote view -- we simply
> don't have another option.

This is where things get complicated because I would argue that we *can*
push down and in some cases we *have* to, but we definitely need to sort
out when to do which and I think that's where the whole SQL/MED stuff
and FOREIGN FUNCTIONs come into play.

In other words, I see options like:

a) Function is a FOREIGN FUNCTION, therefore we must push it down
b) Function is only available locally and therefore we must run it
   locally
c) Function is available in *both* places and expected to have the same
   behavior

'a' and 'b' are pretty clear.  In an ideal world, we'd figure out where
the *best* place to run 'c' is and then run it there.  Perhaps there's a
way to set up a cost model for it, but at least at the moment, we only
have one cost for functions and it'd almost certainly be the same value
for local as for remote in the 'c' case, which means we need something
else.  For my 2c, I'd be inclined to use a heuristic of "if it's
actually possible for us to push it down, we should do so."  Of course
there'll be cases where it's terrible to do so and so it'd be good if
there was some way for users to tell us which to do (maybe we have a GUC
for it?  The enable_* mechanism is an unfortunately large hammer but it
might cover enough cases (enable_funcpushdown?)).

> > The function
> > may also return a much smaller result than pulling back the raw data (of
> > course, the opposite could also possibly be true).
> 
> Yeah, this is a legitimate point, but again we have no very good way
> of estimating which is better.

Yeah, I agree with that.

> In general, there's an awful lot of postgres_fdw's behavior that depends
> on the assumption that the remote and local servers are pretty
> interchangeable, not least that we take the remote's cost numbers at
> face value when preparing cost numbers for a foreign scan.  The only
> consideration that we really can reliably optimize push-down choices with
> is trying to reduce the volume of data transmitted, and we do that without
> consideration for whether the operations pushed across might take more or
> less time when run on the other server.  Maybe someday that could be
> improved, but it seems like a nontrivial research project involving a
> lot more moving parts than just this point.

I agree with all of that- but it seems like we've got people asking for
it, and I can understand why they are, and therefore I'd be inclined to
provide a way for users to get that behavior and I'm even inclined to
say that pushing down should be the 'default' if everything else is
equal and it's possible to do so.  Figuring out a way to decide which is
likely to be better based on stats and other information could then be a
research project to improve on the simple heuristic.

Thanks,

Stephen


signature.asc
Description: PGP signature


  1   2   3   4   >