Re: lower() and unaccent() not leakproof

2021-08-26 Thread Peter Eisentraut

On 26.08.21 06:52, David G. Johnston wrote:
On Wednesday, August 25, 2021, Christophe Pettus > wrote:


lower() and unaccent() (and most string functions) are not marked as
leakproof.  Is this due to possible locale / character encoding
errors they might encounter?


I think you are partially correct.  Its due to the fact that error 
messages, regardless of the root cause, result in the printing of the 
input value in the error message as context, thus exists a leak via a 
violation of “ It reveals no information about its arguments other than 
by its return value. ”


I think if you trace the code, you might find that lower() and upper() 
can't really leak anything.  It might be worth taking a careful look and 
possibly lifting this restriction.





Re: lower() and unaccent() not leakproof

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 09:58, Peter Eisentraut 
>  wrote:
> 
> On 26.08.21 06:52, David G. Johnston wrote:
>> On Wednesday, August 25, 2021, Christophe Pettus > > wrote:
>>lower() and unaccent() (and most string functions) are not marked as
>>leakproof.  Is this due to possible locale / character encoding
>>errors they might encounter?
>> I think you are partially correct.  Its due to the fact that error messages, 
>> regardless of the root cause, result in the printing of the input value in 
>> the error message as context, thus exists a leak via a violation of “ It 
>> reveals no information about its arguments other than by its return value. ”
> 
> I think if you trace the code, you might find that lower() and upper() can't 
> really leak anything.  It might be worth taking a careful look and possibly 
> lifting this restriction.

Wouldn’t the difference in possible error messages in upper/lower be able to
leak whether the input is ascii or wide chars, and/or the collation?

--
Daniel Gustafsson   https://vmware.com/





Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
Hi,
I have following case: local pg_dump (v15) connecting to remote
PostgreSQL (v12).

I'm trying to get just schema (pg_dump -s). It's taking very long, which
is kinda OK given that there is long distance and latency, but I got
curious and checked queries that the pg_dump was running (select * from
pg_stat_activity where application_name = 'pg_dump').

And I noticed that many of these queries repeat many times.

The ones that I noticed were:
SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
around the time that
SELECT
proretset,
prosrc,
probin,
provolatile,
proisstrict,
prosecdef,
lanname,
proconfig,
procost,
prorows,
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
pg_catalog.pg_get_function_result(p.oid) AS funcresult,
proleakproof,
array_to_string(protrftypes, ' ') AS protrftypes,
proparallel,
prokind,
prosupport,
NULL AS prosqlbody
FROM
pg_catalog.pg_proc p,
pg_catalog.pg_language l
WHERE
p.oid = '60188'::pg_catalog.oid
AND l.oid = p.prolang

was called too.

It seems that for every function, pg_dump is getting it's data, and then
runs format_type on each parameter/output type? I'm mostly guessing
here, as I didn't read the code.

Wouldn't it be possible to get all type formats at once, and cache them
in pg_dump? Or at the very least reuse already received information?

Unfortunately it seems I can't run pg_dump closer to the db server, and
the latency of queries is killing me.

It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
with env variables configuring db connection) hasn't written even single
byte to schema.dump)

depesz




Re: lower() and unaccent() not leakproof

2021-08-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 26.08.21 06:52, David G. Johnston wrote:
>> On Wednesday, August 25, 2021, Christophe Pettus > > wrote:
>>> lower() and unaccent() (and most string functions) are not marked as
>>> leakproof.  Is this due to possible locale / character encoding
>>> errors they might encounter?

> I think if you trace the code, you might find that lower() and upper() 
> can't really leak anything.  It might be worth taking a careful look and 
> possibly lifting this restriction.

Generally speaking, we've been resistant to marking anything leakproof
unless it has a very small code footprint that can be easily audited.

In particular, anything that shares a lot of infrastructure with
not-leakproof functions seems quite hazardous.  Even if you go through
the code and convince yourself that it's OK today, innocent changes
to the shared infrastructure could break the leakproofness tomorrow.

regards, tom lane




Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread Tom Lane
hubert depesz lubaczewski  writes:
> It seems that for every function, pg_dump is getting it's data, and then
> runs format_type on each parameter/output type? I'm mostly guessing
> here, as I didn't read the code.
> Wouldn't it be possible to get all type formats at once, and cache them
> in pg_dump? Or at the very least reuse already received information?

Send a patch ...

regards, tom lane




Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
> > It seems that for every function, pg_dump is getting it's data, and then
> > runs format_type on each parameter/output type? I'm mostly guessing
> > here, as I didn't read the code.
> > Wouldn't it be possible to get all type formats at once, and cache them
> > in pg_dump? Or at the very least reuse already received information?
> Send a patch ...

Yeah, that's not going to work, my C skills are next-to-none :(

I guess I'll have to wait till someone else will assume it's a problem,
someone with skills to do something about it.

Best regards,

depesz





Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread Tom Lane
hubert depesz lubaczewski  writes:
> On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
>> hubert depesz lubaczewski  writes:
>>> Wouldn't it be possible to get all type formats at once, and cache them
>>> in pg_dump? Or at the very least reuse already received information?

>> Send a patch ...

> Yeah, that's not going to work, my C skills are next-to-none :(
> I guess I'll have to wait till someone else will assume it's a problem,
> someone with skills to do something about it.

Well, you could move it forward by doing the legwork to identify which
queries are worth merging.  Is it really sane to do a global "select
format_type() from pg_type" query and save all the results on the client
side?  I wonder whether there are cases where that'd be a net loss.
You could do the experimentation to figure that out without necessarily
having the C skills to make pg_dump actually do it.

regards, tom lane




Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> hubert depesz lubaczewski  writes:
> > On Thu, Aug 26, 2021 at 10:02:07AM -0400, Tom Lane wrote:
> >> hubert depesz lubaczewski  writes:
> >>> Wouldn't it be possible to get all type formats at once, and cache them
> >>> in pg_dump? Or at the very least reuse already received information?
> 
> >> Send a patch ...
> 
> > Yeah, that's not going to work, my C skills are next-to-none :(
> > I guess I'll have to wait till someone else will assume it's a problem,
> > someone with skills to do something about it.
> 
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging.  Is it really sane to do a global "select

Sure. On it. Will report back when I'll have more info.

Best regards,

depesz





Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread Adrian Klaver

On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:

Hi,
I have following case: local pg_dump (v15) connecting to remote
PostgreSQL (v12).


So you are using a dev version of pg_dump or is that a typo?



It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
with env variables configuring db connection) hasn't written even single
byte to schema.dump)


What happens if you run without the -v?



depesz





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:
> On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
> > Hi,
> > I have following case: local pg_dump (v15) connecting to remote
> > PostgreSQL (v12).
> So you are using a dev version of pg_dump or is that a typo?

Yes. I'm running pg_dump from my computer to (very) remote db server.

> > It's been 15 minutes, and pg_dump (called: pg_dump -v -s -f schema.dump,
> > with env variables configuring db connection) hasn't written even single
> > byte to schema.dump)
> What happens if you run without the -v?

Well, I guess it works, but with no output I can't judge how fast.
Definitely doesn't seem to be going any faster.

Best regards,

depesz





Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread Adrian Klaver

On 8/26/21 7:35 AM, hubert depesz lubaczewski wrote:

On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:

On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:

Hi,
I have following case: local pg_dump (v15) connecting to remote
PostgreSQL (v12).

So you are using a dev version of pg_dump or is that a typo?


Yes. I'm running pg_dump from my computer to (very) remote db server.


SSHing and dumping on the remote is out as a short term solution?


Well, I guess it works, but with no output I can't judge how fast.
Definitely doesn't seem to be going any faster.


Unknown slow, that didn't help.



Best regards,

depesz




--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
On Thu, Aug 26, 2021 at 07:46:46AM -0700, Adrian Klaver wrote:
> On 8/26/21 7:35 AM, hubert depesz lubaczewski wrote:
> > On Thu, Aug 26, 2021 at 07:34:26AM -0700, Adrian Klaver wrote:
> > > On 8/26/21 1:44 AM, hubert depesz lubaczewski wrote:
> > > > Hi,
> > > > I have following case: local pg_dump (v15) connecting to remote
> > > > PostgreSQL (v12).
> > > So you are using a dev version of pg_dump or is that a typo?
> > 
> > Yes. I'm running pg_dump from my computer to (very) remote db server.
> SSHing and dumping on the remote is out as a short term solution?

As I mentioned in original post - I can't run pg_dump closer to server.
SSH is not available, at least for me.

Anyway - I got the dump, so I am good for now, but I think that this
could be improved, so I'll work on getting some stats on queries.

Best regards,

depesz





Re: lower() and unaccent() not leakproof

2021-08-26 Thread Peter Eisentraut

On 26.08.21 10:40, Daniel Gustafsson wrote:

On 26 Aug 2021, at 09:58, Peter Eisentraut  
wrote:

On 26.08.21 06:52, David G. Johnston wrote:

On Wednesday, August 25, 2021, Christophe Pettus mailto:x...@thebuild.com>> wrote:
lower() and unaccent() (and most string functions) are not marked as
leakproof.  Is this due to possible locale / character encoding
errors they might encounter?
I think you are partially correct.  Its due to the fact that error messages, 
regardless of the root cause, result in the printing of the input value in the 
error message as context, thus exists a leak via a violation of “ It reveals no 
information about its arguments other than by its return value. ”


I think if you trace the code, you might find that lower() and upper() can't 
really leak anything.  It might be worth taking a careful look and possibly 
lifting this restriction.


Wouldn’t the difference in possible error messages in upper/lower be able to
leak whether the input is ascii or wide chars, and/or the collation?


Yeah, but there aren't any error messages that relate to the argument 
string, if you look through the code.  There isn't any "could not find 
lower case equivalent of %s" or anything like that.  Once you have found 
the right collation and locale and server encoding and have allocated 
some memory, the conversion always succeeds.


The collation is not secret, it's determined by parse analysis.




Re: lower() and unaccent() not leakproof

2021-08-26 Thread Peter Eisentraut

On 26.08.21 16:00, Tom Lane wrote:

Generally speaking, we've been resistant to marking anything leakproof
unless it has a very small code footprint that can be easily audited.

In particular, anything that shares a lot of infrastructure with
not-leakproof functions seems quite hazardous.  Even if you go through
the code and convince yourself that it's OK today, innocent changes
to the shared infrastructure could break the leakproofness tomorrow.


I think the complexity of the implementation of upper() and lower() is 
on the same order as bttextcmp() and similar, so it wouldn't be totally 
out of scope.






Re: lower() and unaccent() not leakproof

2021-08-26 Thread Daniel Gustafsson
> On 26 Aug 2021, at 16:59, Peter Eisentraut 
>  wrote:
> On 26.08.21 10:40, Daniel Gustafsson wrote:

>> Wouldn’t the difference in possible error messages in upper/lower be able to
>> leak whether the input is ascii or wide chars, and/or the collation?
> 
> Yeah, but there aren't any error messages that relate to the argument string, 
> if you look through the code.  There isn't any "could not find lower case 
> equivalent of %s" or anything like that.

Correct.  My reading of "It reveals no information about its arguments other
than by its return value” was that errormessages indicating different code-
paths based on argument structure weren't allowed. That might have been a bit
too lawyery interpretation though.

--
Daniel Gustafsson   https://vmware.com/





Re: Can we get rid of repeated queries from pg_dump?

2021-08-26 Thread hubert depesz lubaczewski
On Thu, Aug 26, 2021 at 10:20:29AM -0400, Tom Lane wrote:
> Well, you could move it forward by doing the legwork to identify which
> queries are worth merging.  Is it really sane to do a global "select
> format_type() from pg_type" query and save all the results on the client
> side?  I wonder whether there are cases where that'd be a net loss.
> You could do the experimentation to figure that out without necessarily
> having the C skills to make pg_dump actually do it.

So, I got some info.

First, some stats. The DB contains:

- 14 extensions
- 1 aggregate
- 107 functions
- 5 schemas
- 5 sequences
- 188 logged tables
- 1 unlogged table
- 206 "normal" indexes
- 30 unique indexes
- 15 materialized views
- 16 triggers
- 87 types
- 26 views

pg_dump -s of it is ~ 670kB.

Interestingly, while dumping (pg_dump -s -v), we can see progress going on, and 
then, after:


...
pg_dump: reading publications
pg_dump: reading publication membership
pg_dump: reading subscriptions
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
pg_dump: saving search_path = 


It stops (progress visible in console). And then, in pg logs I see queries like:

#v+
SELECT
proretset,
prosrc,
probin,
provolatile,
proisstrict,
prosecdef,
lanname,
proconfig,
procost,
prorows,
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,
pg_catalog.pg_get_function_result(p.oid) AS funcresult,
proleakproof,
array_to_string(protrftypes, ' ') AS protrftypes,
proparallel,
prokind,
prosupport,
NULL AS prosqlbody
FROM pg_catalog.pg_proc p, pg_catalog.pg_language l
WHERE p.oid = '43875'::pg_catalog.oid AND l.oid = p.prolang
#v-

Now for query stats.

To dump it all, pg_dump needed 9173 queries (logged by
log_min_duration_statement = 0 for this user).

I extracted all queries to separate files, and made stats. In total there were
only 4257 unique queries.

Then I checked for repeated queries. Top 10 most repeated offenders were:

615 times : SELECT pg_catalog.format_type('25'::pg_catalog.oid, NULL)
599 times : SELECT pg_catalog.format_type('23'::pg_catalog.oid, NULL)
579 times : SELECT pg_catalog.format_type('2281'::pg_catalog.oid, NULL)
578 times : SELECT pg_catalog.format_type('41946'::pg_catalog.oid, NULL)
523 times : SELECT pg_catalog.format_type('701'::pg_catalog.oid, NULL)
459 times : SELECT pg_catalog.format_type('42923'::pg_catalog.oid, NULL)
258 times : SELECT pg_catalog.format_type('16'::pg_catalog.oid, NULL)
176 times : SELECT pg_catalog.format_type('19'::pg_catalog.oid, NULL)
110 times : SELECT pg_catalog.format_type('21'::pg_catalog.oid, NULL)
106 times : SELECT pg_catalog.format_type('42604'::pg_catalog.oid, NULL)

In total, there were 5000 queries:
SELECT pg_catalog.format_type('[0-9]+'::pg_catalog.oid, NULL)

But there were only 83 separate oids that were scanned.

The only other repeated command was:
SELECT pg_catalog.set_config('search_path', '', false);
and it was called only twice.

Based on my reading of queries in order it seems to follow the pattern of:

One call for:

SELECT  proretset,  prosrc,  probin,  provolatile,  proisstrict,  prosecdef,  
lanname,  proconfig,  procost,  prorows,  
pg_catalog.pg_get_function_arguments(p.oid) AS funcargs,  
pg_catalog.pg_get_function_identity_arguments(p.oid) AS funciargs,  
pg_catalog.pg_get_function_re
sult(p.oid) AS funcresult,  proleakproof,  array_to_string(protrftypes, ' ') AS 
protrftypes,  proparallel,  prokind,  prosupport,  NULL AS prosqlbody  FROM 
pg_catalog.pg_proc p, pg_catalog.pg_language l  WHERE p.oid = 
'SOME_NUMBER'::pg_catalog.oid AND l.oid = p.prolang 

and then one or more:

SELECT pg_catalog.format_type('SOME_NUMBER'::pg_catalog.oid, NULL)

In one case, after proc query, there were 94 concecutive
pg_catalog.format_type queries.

I hope it helps.

Best regards,

depesz





Re: lower() and unaccent() not leakproof

2021-08-26 Thread Tom Lane
Peter Eisentraut  writes:
> I think the complexity of the implementation of upper() and lower() is 
> on the same order as bttextcmp() and similar, so it wouldn't be totally 
> out of scope.

True.  But you'll recall that the decision to mark bttextcmp() and
cohorts as leakproof was not made without fear.  IMV, that decision
did not rest simply on code review but on two arguments about why
we should take the risk:

* The query-optimization usefulness of having those be leakproof
is extremely high.

* btree comparison functions should really not have any user-reachable
failure modes (which comes close to being the definition of leakproof).
If one did, that would mean there were legal values of the type that
couldn't be put into a btree index.

Maybe similar arguments can be made about upper/lower, but I think
it's a far weaker case.  As for unaccent, the fact that it relies
on user-controllable definitions seems to me to make it almost
certainly unsafe to be leakproof.

regards, tom lane




Re: Manual failover cluster

2021-08-26 Thread Ninad Shah
Hi Saul,

Hope you are doing well. My apology for no response for a longer time.

Pgbackrest helps build a streaming replication. While performing role
reversal(switchover), it is mandatory to set recover_target_timeline to
latest in recovery.conf(in data directory). Steps to perform switchover is
as below.

1) Stop master database
2) Promote the slave database
3) Prepare the recovery.conf file on the stopped master node, and set
recover_target_timeline to latest in that file
4) Start the stopped master database; it will automatically come up as a
slave
5) Check status of the database.

Additionally, entries in pg_hba.conf is required.

Hope this helps.


Regards,
Ninad Shah


On Mon, 23 Aug 2021 at 23:12, Saul Perdomo  wrote:

> Sorry, I misspoke there - I meant to say that since one should not count
> on the standby-failover process to always run smoothly (whether it's due to
> hardware, operator, automated scripts, or software issues), DB backups
> should also be in place if at all possible.
>
> On Mon, Aug 23, 2021 at 1:37 PM Saul Perdomo 
> wrote:
>
>> Hi Moishe,
>>
>> Since we use pgbackrest ourselves, this is the process I followed to set
>> up something similar on PG 10:
>>
>>
>> https://pgstef.github.io/2018/11/28/combining_pgbackrest_and_streaming_replication.html
>>
>> (Not knowing much [if at all] about the reason for your requirements, I
>> would recommend looking into enabling PITR backups in addition to the hot
>> standby[s], which as you've seen are not bulletproof.)
>>
>> Anyway, just wanted to add that your experience roughly matches ours --
>> it seems that an attempt to promote the standby isn't always smooth. My
>> sample size is almost certainly smaller than yours, about half a dozen
>> attempts. In the past I've attributed this to a poor understanding on my
>> part of the behind-the-scenes of the process, and chalked it up to having
>> made a mistake or other in the manual steps (mainly in the order they are
>> executed). That being said, if you find a way to faithfully reproduce the
>> issue, I'm sure the community will want to know, there is certainly an
>> expectation that the failover is reliable from the PG software side of
>> things, as long as there are no hardware nor operator issues! Again, not
>> knowing a lot about your setup, my first instinct would be to troubleshoot
>> your automated scripts, you might find that you need to change the order
>> things are run when on server B vs server A, for example..
>>
>> On Fri, Aug 20, 2021 at 9:23 AM Hispaniola Sol 
>> wrote:
>>
>>> Team,
>>>
>>> I have a pg 10 cluster with a master and two hot-standby nodes. There is
>>> a requirement for a manual failover (nodes switching the roles) at will.
>>> This is a vanilla 3 node PG cluster that was built with WAL archiving
>>> (central location) and streaming replication to two hot standby nodes.  The
>>> failover is scripted in Ansible. Ansible massages and moves around the
>>> archive/restore scripts, the conf files and the trigger and calls `
>>> pg_ctlcluster` to start/stop. This part _seems_ to be doing the job fine.
>>>
>>> The issue I am struggling with is the apparent fragility of the process
>>> - all 3 nodes will end up in a "good" state after the switch only every
>>> other time. Other times I have to rebase the hot-standby from the new
>>> master with pg_basebackup. It seems the issues are mostly with those nodes,
>>> ending up as slaves after the roles switch runs.
>>> They get errors like mismatch in timelines, recovering from the same WAL
>>> over and over again, invalid resource manager ID in primary checkpoint
>>> record, etc.
>>>
>>> In this light, I am wondering - using what's offered by PostgreSQL
>>> itself, i.e. streaming WAL replication with log shipping - can I expect to
>>> have this kind of failover 100% reliable on PG side ? Anyone is doing this
>>> reliably on PostgreSQL 10.1x ?
>>>
>>> Thanks !
>>>
>>> Moishe
>>>
>>