from:"tomas"


On Sun, Oct 06, 2019 at 08:45:40PM -0700, Konstantin Izmailov wrote:

Hi,
I'm using libpq (v10) to import lots of xml files into a PG10 table. I
noticed if number of records imported exceeds 2100 then the following error
is returned:
ERROR:  invalid XML content
DETAIL:  line 1: Couldn't find end of Start Tag
timeBasedFileNamingAndTriggerin line 1
logFile.%d{-MM-dd}.%i.html

My guess is this is an issue/limitation in libxml2, which we use to
parse and process XML. What libxml2 version you have installed? Can you
share an example of a XML document to reproduce the issue?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Install postgres on rhel 7


On Mon, Oct 07, 2019 at 01:00:32AM -0400, Mageshwaran Janarthanam wrote:

Hi Team,

I have question on the postgres install process. Tried to read the archives
but couldn't find it.

I wanted to install postgres server via rpm and wanted to store the
binaries under non default location like /pgbin.



The rpm package (at least not those available at yum.postgresql.org) are
not relocatable:

 rpm -qpi postgresql12-12.0-1PGDG.rhel7.x86_64.rpm  | grep Relocations
 Relocations : (not relocatable)

So unfortunately, you'll have to build your own packages, or something
like that.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PG 12 not yet for mac


On Mon, Oct 07, 2019 at 11:11:43AM -0400, Ravi Krishna wrote:

https://postgresapp.com/downloads.html

The link which says PG 12 is actually PG 11.



Not sure if the link is correct or not (it seems to point to
Postgres-2.3-12.dmg, which seems like it might be v12). But more
imporantly, that's not a project/page managed by the PostgreSQL
community, you need to report the issues to the authors (most likely
through github issues).

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgcrypto question


On Mon, Oct 07, 2019 at 12:05:16PM -0400, Erik Aronesty wrote:

Currently, it is my understanding that the pgcrypto library requires
the user to send a password or private key up to the server for
decryption.



Correct. In the naive case the key is included in each SQL query, which
does have various issues. Bruce Momjian has a nice extension that allows
you to fix that by loading the key into backend memory:

 http://momjian.us/download/pgcryptokey/


Is there a notion of a client-side encrypt/decrypt plugin when doing a
postgres query?

For example, a user could query postgres, get back data of type
"encrypted", and a "libpq" plugin could decode/decrypt those columns
that are of data type "encrypted" in a manner transparent to the
user of the client

Obviously I could write this by intercepting the relevant libpq calls
using LD_PRELOAD or Microsoft's "Detours" ... but it there a better
way to do that?



AFAIk that's usually done at the application level, i.e. the application
is sending/receiving encrypted data, and the database simply sees bytea
columns. I'm not aware of a driver doing that transparently, but it
seems like an interesting idea - I wonder if it could be done e.g. in
psycopg as an extension, or something like that.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pgcrypto question


On Mon, Oct 07, 2019 at 02:51:30PM -0400, Erik Aronesty wrote:

Good idea for "psycopg".  It would be easy for a POC, but I think the
only meaningful layer to operate at would be a libpq drop-in
replacement that intercepts PQgetvalue, PQprepare, PQexecParams,
PQexecPrepared ... etc.   That way odbc, python, node, etc would "just
work" as long as you used LD_PRELOAD appropriately.



It's not clear to me how would that know which columns are encrypted,
with what key, etc. Because those encrypted columns are essentially just
regular bytea columns, so there's no easy way to distinguish them.

I'm no psycopg2 expert, but it does have some infrastructure for casting
PostgreSQL types to Python types, and I guess that could be used for the
encryption.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: temporary files


On Mon, Oct 07, 2019 at 01:52:41PM -0700, dangal wrote:

Hello everyone, I have a question to see if you can help me, I have set
work_mem in 100 MB but I have the following in the pgbadger

Queries generating the most temporary files (N)

Count Total size   Min size  Max sizeAvg size
   58   3.24 GiB 57.15 MiB  57.15 MiB 57.15 MiB

How could it be that if you require 57 MB, 100 MB will not reach ?
thanks a lot



The on-disk and in-memory representations are not the same, and the
on-disk one is often significantly more efficient. Data that needs 57MB
temporary file may beed ~150MB memory to perform in-memory sort, for
example.

Yet another reason may be that e.g. hash join splits the memory into
batches, and each one has to fit into work_mem. And whevener we hit the
limit, we double the number of batches, i.e. we cut the batch size in
half. Thus the size is somewhere between 50 and 100MB, with ~75MB on
average. Furthermore, we also estimate the batch size before running the
query, so we may start with unnecessarily many batches, pushing the
average batch size down even more.


FWIW it's somewhat suspicious you have 58 temp files with almost no
variability in size. That seems as if a single query was executed
repeatedly. Maybe try looking into what query that is.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Table locking during backup


On Mon, Oct 07, 2019 at 11:00:08PM +0200, Artur Zając wrote:

Hi,



I cannot reproduce some scenario  I found in my PostgreSQL logs.



I have two connections/processes:



Process 24755 is standard pg_backup connection with:



.

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ, READ ONLY

.

LOCK TABLE gm.tableabc;

.

COPY FROM gm.tableabc



Process 5969 is normal connection where I have:



LOG: proces 5969 still waiting for AccessExclusiveLock on relations
562888531 of database 16429 after 1000.066 ms

DETAILT: Process holding the lock: 24755. Wait queue: 5969



Query that is waiting (from proces 5969) is:



CREATE TEMP TABLE IF NOT EXISTS tableabc

(

Id BIGINT DEFAULT gm.top()

) INHERITS (gm.tableabc);



I cannot reproduce what pg_dump is doing that causes waiting on proces 5969
(which lock is acquired and why). When pg_dump ends lock is released and
proces 5969 continues.



I know that doing in two parallel connections:



BEGIN;



CREATE TEMP TABLE IF NOT EXISTS tableabc

(

Id BIGINT DEFAULT gm.top()

) INHERITS (gm.tableabc);



causes  waiting state on second connection until the first finished, but
pg_dump connection is read only.



Not sure why would it matter that the pg_dump connection is read-only,
this is about locking because pg_dump needs to ensure the schema does
not change while it's running.

pg_dump does not do

 LOCK TABLE gm.tableabc;

but

 LOCK TABLE gm.tableabc IN ACCESS SHARE MODE;

Which should be visible in pg_locks system view. And it does conflict
with the ACCESS EXCLUSIVE mode, used by the second query.




Could you suggest me which part of pg_dump (which query)  might cause that
behaviour.



It's this line:

https://github.com/postgres/postgres/blob/master/src/bin/pg_dump/pg_dump.c#L6676


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: timescaleDB & WAL replication

2019-10-15 Thread Tomas Vondra


On Tue, Oct 15, 2019 at 09:28:27AM +0200, basti wrote:

Hello,
I have a Master / Slave Postgres setup with WAL Replication. Now I want
to add timescaleDB.

I found this todo: https://docs.timescale.com/latest/tutorials/replication

As I understand that in the right way I just need to add the timescaleDB
extention on the master side?



AFAIK timescale is "just" a regular extension, in which case yes, all
you need to do is installing it on the master side.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

On Fri, Oct 18, 2019 at 09:14:09PM -0500, Ariadne Conill wrote:

Hello,

On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost  wrote:

Greetings,

* Ariadne Conill (aria...@dereferenced.org) wrote:
> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver  
wrote:
> > https://www.postgresql.org/docs/11/functions-json.html
> > " The field/element/path extraction operators return NULL, rather than
> > failing, if the JSON input does not have the right structure to match
> > the request; for example if no such element exists"
>
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.
>
> > Just trying to figure why one is worse then the other.
>
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.

Your contention that the documented, clear, and easily addressed
behavior of a particular strict function equates to "the database system
loses data and isn't durable" is really hurting your arguments here, not
helping it.

The argument about how it's unintuitive and can cause application
developers to misuse the function (which is clearly an application bug,
but perhaps an understandable one if the function interface isn't
intuitive or is confusing) is a reasonable one and might be convincing
enough to result in a change here.

I'd suggest sticking to the latter argument when making this case.

> > > I believe that anything that can be catastrophically broken by users
> > > not following upgrade instructions precisely is a serious problem, and
> > > can lead to serious problems.  I am sure that this is not the only
> > > project using JSONB which have had users destroy their own data in
> > > such a completely preventable fashion.

Let's be clear here that the issue with the upgrade instructions was
that the user didn't follow your *application's* upgrade instructions,
and your later code wasn't written to use the function, as documented,
properly- this isn't a case of PG destroying your data.  It's fine to
contend that the interface sucks and that we should change it, but the
argument that PG is eating data because the application sent a query to
the database telling it, based on our documentation, to eat the data,
isn't appropriate.  Again, let's have a reasonable discussion here about
if it makes sense to make a change here because the interface isn't
intuitive and doesn't match what other systems do (I'm guessing it isn't
in the SQL standard either, so we unfortunately can't look to that for
help; though I'd hardly be surprised if they supported what PG does
today anyway).

Okay, I will admit that saying PG is eating data is perhaps
hyperbolic,

My experience is that using such hyperbole is pretty detrimental, even
when one is trying to make a pretty sensible case. The problem is that
people often respond in a similarly hyperbolic claims, particularly when
you hit a nerve. And that's exactly what happened here, becase we're
*extremely* sensitive about data corruption issues, so when you claim
PostgreSQL is "eating data" people are likely to jump on you, beating
you with the documentation stick. It's unfortunate, but it's also
entirely predictable.

but I will also say that the behaviour of jsonb_set()
under this type of edge case is unintuitive and frequently results in
unintended data loss.  So, while PostgreSQL is not actually eating the
data, it is putting the user in a position where they may suffer data
loss if they are not extremely careful.

Here is how other implementations handle this case:

MySQL/MariaDB:

select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
  {"a":null,"b":2,"c":3}

Microsoft SQL Server:

select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
  {"b":2,"c":3}

Both of these outcomes make sense, given the nature of JSON objects.
I am actually more in favor of what MSSQL does however, I think that
makes the most sense of all.

I do mostly agree with this. The json[b]_set behavior seems rather
surprising, and I think I've seen a couple of cases running into exactly
this issue. I've solved that with a simple CASE, but maybe changing the
behavior would be better. That's unlikely to be back-patchable, though,
so maybe a better option is to create a non-strict wrappers. But that
does not work when the user is unaware of the behavior :-(

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Replication of Replication


On Fri, Oct 18, 2019 at 05:54:34PM -0400, Edilmar Alves wrote:
I have 3 servers running CentOS+PG 11 
(postgresql11-server-11.5-1PGDG.rhel7.x86_64):

- s1: main db + publication
- s2: subscription of the main db
all works fine until here...
Now, I tried to config this:
- s2: publication of the same db that was replicated from s1
- s3: subscription of the db from s2
When I did this, replication s1=>s2 stopped, and replication s2=>s3 
never worked.

Is this not possible?


You'll have to share more details - error messages from the server log,
how you check that the replication stopped, etc.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: releasing space


On Thu, Oct 17, 2019 at 05:20:09PM +, Julie Nishimura wrote:

Hello everybody, We are running PostgreSQL 9.6.2 cluster master ->
standby (streaming replication). 22 tb of space (constantly struggling
with the space, pruning the old data, but not fast enough). The biggest
db takes 16 tb. So, we've copied it to another server, and now we would
like to delete it from our original source, to free up the space. What
would be the right approach for this?  Just issue drop database command
(16tb). How long it might take? Should we do it gradually (drop biggest
tables first)? Any suggestions? Caveats?



Generally speaking, DROP DATABASE simply recursively drops all the
various objects - indexes, tables, etc. It mostly just deleting the
files, which should not be very expensive (we certainly don't need to
delete all the data or anything), but there's certain number of I/O
involved. But it does depend on the OS / filesystem / hardware if that's
an issue.

So if you want to be on the safe side, you can drop the objects one by
one, with a bit of delay between them, to throttle the I/O a bit.

FWIW the latest minor release for 9.6 is 9.6.15, you're 13 minor
versions (~30 months) of fixes behind. You might want to consider
upgrading ...


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Has there been any discussion of custom dictionaries being defined in the database?


On Thu, Oct 17, 2019 at 11:52:39AM +0200, Tom Lane wrote:

Morris de Oryx  writes:

Given that Amazon is bragging this week about turning off Oracle, it seems
like they could kick some resources towards contributing something to the
Postgres project. With that in mind, is the idea of defining dictionaries
within a table somehow meritless, or unexpectedly difficult?


Well, it'd just be totally different.  I don't think anybody cares to
provide two separate definitions of common dictionaries (which'd have to
somehow be kept in sync).

As for why we did it with external text files in the first place ---
for at least some of the dictionary types, the point is that you can
drop in data files that are available from upstream sources, without any
modification.  Getting the same info into a table would require some
nonzero amount of data transformation.



IMHO being able to load dictionaries from a table would be quite
useful, and not just because of RDS. For example, it's not entirely true
we're just using the upstream dictionaries verbatim - it's quite common
to add new words, particularly in specialized fields. That's way easier
when you can do that through a table and not through a file.


Having said that ... in the end a dictionary is really just a set of
functions implementing the dictionary API; where they get their data
from is their business.  So in theory you could roll your own
dictionary that gets its data out of a table.  But the dictionary API
would be pretty hard to implement except in C, and I bet RDS doesn't
let you install your own C functions either :-(



Not sure. Of course, if we expect the dictionary to work just like the
ispell one, with preprocessing the dictionary into shmem, then that
requires C. I don't think that's entirely necessary, thoug - we could
use the table directly. Yes, that would be slower, but maybe it'd be
sufficient.

But I think the idea is ultimately that we'd implement a new dict type
in core, and people would just specify which table to load data from.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data


On Sat, Oct 19, 2019 at 11:26:50AM -0400, Andrew Dunstan wrote:


...

The hyperbole here is misplaced. There is a difference between a bug and
a POLA violation. This might be the latter, but it isn't the former. So
please tone it down a bit. It's not the function that's unsafe, but the
ill-informed use of it.


We invented jsonb_set() (credit to Dmitry Dolgov). And we've had it
since 9.5. That's five releases ago.  So it's a bit late to be coming to
us telling us it's not safe (according to your preconceptions of what it
should be doing).


We could change it prospectively (i.e. from release 13 on) if we choose.
But absent an actual bug (i.e. acting contrary to documented behaviour)
we do not normally backpatch such changes, especially when there is a
simple workaround for the perceived problem. And it's that policy that
is in large measure responsible for Postgres' deserved reputation for
stability.



Yeah.



Incidentally, why is your function written in plpgsql? Wouldn't a simple
SQL wrapper be better?


   create or replace function safe_jsonb_set
       (target jsonb, path text[], new_value jsonb, create_missing
   boolean default true)
   returns jsonb as
   $func$
       select case when new_value is null then target else
   jsonb_set(target, path, new_value, create_missing) end
   $func$ language sql;


And if we were to change it I'm not at all sure that we should do it the
way that's suggested here, which strikes me as no more intuitive than
the current behaviour. Rather I think we should possibly fill in a json
null in the indicated place.



Not sure, but that seems rather confusing to me, because it's mixing SQL
NULL and JSON null, i.e. it's not clear to me why

   jsonb_set(..., "...", NULL)

should do the same thing as

   jsonb_set(..., "...", 'null':jsonb)

I'm not entirely surprised it's what MySQL does ;-) but I'd say treating
it as a deletion of the key (just like MSSQL) is somewhat more sensible.
But I admit it's quite subjective.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data


On Sat, Oct 19, 2019 at 11:21:26AM -0400, Stephen Frost wrote:

Greetings,

* Dmitry Dolgov (9erthali...@gmail.com) wrote:

If we want to change it, the question is where to stop? Essentially we have:

update table set data = some_func(data, some_args_with_null);

where some_func happened to be jsonb_set, but could be any strict function.


I don't think it makes any sense to try and extrapolate this out to
other strict functions.  Functions should be strict when it makes sense
for them to be- in this case, it sounds like it doesn't really make
sense for jsonb_set to be strict, and that's where we stop it.



Yeah. I think the issue here is (partially) that other databases adopted
similar functions after us, but decided to use a different behavior. It
might be more natural for the users, but that does not mean we should
change the other strict functions.

Plus I'm not sure if SQL standard says anything about strict functions
(I found nothing, but I looked only very quickly), but I'm pretty sure
we can't change how basic operators change, and we translate them to
function calls (e.g. 1+2 is int4pl(1,2)).


I wonder if in this case it makes sense to think about an alternative? For
example, there is generic type subscripting patch, that allows to update a
jsonb in the following way:

update table set jsonb_data[key] = 'value';

It doesn't look like a function, so it's not a big deal if it will handle NULL
values differently. And at the same time one can argue, that people, who are
not aware about this caveat with jsonb_set and NULL values, will most likely
use it due to a bit simpler syntax (more similar to some popular programming
languages).


This seems like an entirely independent thing ...



Right. Useful, but entirely separate feature.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

On Sat, Oct 19, 2019 at 12:47:39PM -0400, Andrew Dunstan wrote:

On 10/19/19 12:32 PM, David G. Johnston wrote:

On Sat, Oct 19, 2019 at 9:19 AM Tomas Vondra
mailto:tomas.von...@2ndquadrant.com>>
wrote:

>
>We invented jsonb_set() (credit to Dmitry Dolgov). And we've had it
>since 9.5. That's five releases ago.  So it's a bit late to be
coming to
>us telling us it's not safe (according to your preconceptions of
what it
>should be doing).
>

There have been numerous complaints and questions about this behavior
in those five years; and none of the responses to those defenses has
actually made the current behavior sound beneficial but rather have
simply said "this is how it works, deal with it".

I haven't seen a patch, which for most possible solutions should be
fairly simple to code. This is open source. Code speaks louder than
complaints.

IMHO that might be a bit too harsh - I'm not surprised no one sent a
patch when we're repeatedly telling people "you're holding it wrong".
Without a clear consensus what the "correct" behavior is, I wouldn't
send a patch either.

>
>We could change it prospectively (i.e. from release 13 on) if we
choose.
>But absent an actual bug (i.e. acting contrary to documented
behaviour)
>we do not normally backpatch such changes, especially when there is a
>simple workaround for the perceived problem. And it's that policy
that
>is in large measure responsible for Postgres' deserved reputation for
>stability.
>

Yeah.

Agreed, this is v13 material if enough people come on board to support
making a change.

We have changed such things in the past. But maybe a new function might
be a better way to go. I haven't given it enough thought yet.

I think the #1 thing we should certainly do is explaining the behavior
in the docs.

>And if we were to change it I'm not at all sure that we should do
it the
>way that's suggested here, which strikes me as no more intuitive than
>the current behaviour. Rather I think we should possibly fill in
a json
>null in the indicated place.
>

Not sure, but that seems rather confusing to me, because it's
mixing SQL
NULL and JSON null, i.e. it's not clear to me why

[...]

But I admit it's quite subjective.

Providing SQL NULL to this function and asking it to do something with
that is indeed subjective - with no obvious reasonable default, and I
agree that "return a NULL" while possible consistent is probably the
least useful behavior that could have been chosen.  We should never
have allowed an SQL NULL to be an acceptable argument in the first
place, and can reasonably safely and effectively prevent it going
forward.  Then people will have to explicitly code what they want to
do if their data and queries present this invalid unknown data to the
function.

How exactly do we prevent a NULL being passed as an argument? The only
thing we could do would be to raise an exception, I think. That seems
like a fairly ugly thing to do, I'd need a h3eck of a lot of convincing.

I don't know, but if we don't know what the "right" behavior with NULL
is, is raising an exception really that ugly?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: releasing space


On Sat, Oct 19, 2019 at 04:34:32PM -0700, Adrian Klaver wrote:

On 10/19/19 4:17 PM, Julie Nishimura wrote:
Thank you, Thomas. Do you know if it is safe to replicate 9.6.2 
(smaller) db to 9.6.15 (larger capacity) using pg_basebackup? Would 
it be considered as an upgrade?


pg_basebackup backups an entire Postgres cluster which will be many 
databases. So when you say db do mean a Postgres cluster or an 
individual database?




My understanding is Julie wants to create a copy of a 9.6.2 cluster
using pg_basebackup and then run 9.6.15 on it. That's OK, it's
essentially a minor version upgrade.

FWIW Julie, please don't top post - it just makes it harder to follow
the discussion. Also, this seems like a completely separate question,
unrelated to the DROP DATABLASE one. It might be better to start a new
thread instead of repurposing an existing one.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-20 Thread Tomas Vondra


On Sun, Oct 20, 2019 at 03:48:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 1:14 PM, David G. Johnston wrote:

On Sun, Oct 20, 2019 at 5:31 AM Andrew Dunstan
mailto:andrew.duns...@2ndquadrant.com>> wrote:

And yet another is to
raise an exception, which is easy to write but really punts the issue
back to the application programmer who will have to decide how to
ensure
they never pass in a NULL parameter.


That's kinda the point - if they never pass NULL they won't encounter
any problems but as soon as the data and their application don't see
eye-to-eye the application developer has to decide what they want to
do about it.  We are in no position to decide for them and making it
obvious they have a decision to make and implement here doesn't seem
like a improper position to take.



The app dev can avoid this problem today by making sure they don't pass
a NULL as the value. Or they can use a wrapper function which does that
for them. So frankly this doesn't seem like much of an advance. And, as
has been noted, it's not consistent with what either MySQL or MSSQL do.
In general I'm not that keen on raising an exception for cases like this.



I think the general premise of this thread is that the application
developer does not realize that may be necessary, because it's a bit
surprising behavior, particularly when having more experience with other
databases that behave differently. It's also pretty easy to not notice
this issue for a long time, resulting in significant data loss.

Let's say you're used to the MSSQL or MySQL behavior, you migrate your
application to PostgreSQL or whatever - how do you find out about this
behavior? Users are likely to visit

   https://www.postgresql.org/docs/12/functions-json.html

but that says nothing about how jsonb_set works with NULL values :-(

You're right raising an exception may not be the "right behavior" for
whatever definition of "right". But I kinda agree with David that it's
somewhat reasonable when we don't know what the "universally correct"
thing is (or when there's no such thing). IMHO that's better than
silently discarding some of the data.

FWIW I think the JSON/JSONB part of our code base is amazing, and the
fact that various other databases adopted something very similar over
the last couple of years just confirms that. And if this is the only
speck of dust in the API, I think that's pretty amazing.

I'm not sure how significant this issue actually is - it's true we got a
couple of complaints over the years (judging by a quick search for
jsonb_set and NULL in the archives), but I'm not sure that's enough to
justify any changes in backbranches. I'd say no, but I have no idea how
many people are affected by this but don't know about it ...

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: CPU SPIKE

2019-10-20 Thread Tomas Vondra


On Mon, Oct 21, 2019 at 10:45:33AM +0530, Pawan Sharma wrote:

Hi All,

Having real high CPU issue (95-98%), with SELECT statements and select
queries contains multiple AND operator, is it will cause any CPU Spike..???

apps team is using sub-partition, PG11, CPU:24, Mem: 16GB



Any query execution is using CPU, so yes - it can cause a spike in CPU
usage. The question is whether the query plan is efficient, or whether
it's using the CPU unenecessarily. You need to show us EXPLAIN ANALYZE
and perhaps explain why you think it shouldn't use this much CPU.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-20 Thread Tomas Vondra


On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 4:18 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 03:48:05PM -0400, Andrew Dunstan wrote:


On 10/20/19 1:14 PM, David G. Johnston wrote:

On Sun, Oct 20, 2019 at 5:31 AM Andrew Dunstan
mailto:andrew.duns...@2ndquadrant.com>> wrote:

    And yet another is to
    raise an exception, which is easy to write but really punts the
issue
    back to the application programmer who will have to decide how to
    ensure
    they never pass in a NULL parameter.


That's kinda the point - if they never pass NULL they won't encounter
any problems but as soon as the data and their application don't see
eye-to-eye the application developer has to decide what they want to
do about it.  We are in no position to decide for them and making it
obvious they have a decision to make and implement here doesn't seem
like a improper position to take.



The app dev can avoid this problem today by making sure they don't pass
a NULL as the value. Or they can use a wrapper function which does that
for them. So frankly this doesn't seem like much of an advance. And, as
has been noted, it's not consistent with what either MySQL or MSSQL do.
In general I'm not that keen on raising an exception for cases like
this.



I think the general premise of this thread is that the application
developer does not realize that may be necessary, because it's a bit
surprising behavior, particularly when having more experience with other
databases that behave differently. It's also pretty easy to not notice
this issue for a long time, resulting in significant data loss.

Let's say you're used to the MSSQL or MySQL behavior, you migrate your
application to PostgreSQL or whatever - how do you find out about this
behavior? Users are likely to visit

   https://www.postgresql.org/docs/12/functions-json.html

but that says nothing about how jsonb_set works with NULL values :-(




We should certainly fix that. I accept some responsibility for the omission.



+1





You're right raising an exception may not be the "right behavior" for
whatever definition of "right". But I kinda agree with David that it's
somewhat reasonable when we don't know what the "universally correct"
thing is (or when there's no such thing). IMHO that's better than
silently discarding some of the data.



I'm not arguing against the idea of improving the situation. But I am
arguing against a minimal fix that will not provide much of value to a
careful app developer. i.e. I want to do more to support app devs.
Ideally they would not need to use wrapper functions. There will be
plenty of situations where it is mighty inconvenient to catch an
exception thrown by jsonb_set(). And catching exceptions can be
expensive. You want to avoid that if possible in your
performance-critical plpgsql code.



True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).





FWIW I think the JSON/JSONB part of our code base is amazing, and the
fact that various other databases adopted something very similar over
the last couple of years just confirms that. And if this is the only
speck of dust in the API, I think that's pretty amazing.



TY. When I first saw the SQL/JSON spec I thought I should send a request
to the SQL standards committee for a royalty payment, since it looked so
familiar ;-)



;-)





I'm not sure how significant this issue actually is - it's true we got a
couple of complaints over the years (judging by a quick search for
jsonb_set and NULL in the archives), but I'm not sure that's enough to
justify any changes in backbranches. I'd say no, but I have no idea how
many people are affected by this but don't know about it ...




No, no backpatching. As I said upthread, this isn't a bug, but it is
arguably a POLA violation, which is why we should do something for
release 13.



WFM


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2019-10-21 Thread Tomas Vondra


On Mon, Oct 21, 2019 at 08:06:46AM -0700, Adrian Klaver wrote:

On 10/20/19 11:07 PM, Tomas Vondra wrote:

On Sun, Oct 20, 2019 at 06:51:05PM -0400, Andrew Dunstan wrote:




True. And AFAIK catching exceptions is not really possible in some code,
e.g. in stored procedures (because we can't do subtransactions, so no
exception blocks).



Can you explain the above to me as I thought there are exception 
blocks in stored functions and now sub-transactions in stored 
procedures.




Sorry for the confusion - I've not been particularly careful when
writing that response.

Let me illustrate the issue with this example:

   CREATE TABLE t (a int);

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 -- SAVEPOINT s1;
 INSERT INTO t VALUES (1);
 -- COMMIT;
   EXCEPTION
 WHEN others THEN
   msg := SUBSTR(SQLERRM, 1, 100);
   RAISE NOTICE 'error: %', msg;
   END; $$;

   CALL test();

If you uncomment the SAVEPOINT, you get

   NOTICE:  error: unsupported transaction command in PL/pgSQL

because savepoints are not allowed in stored procedures. Fine.

If you uncomment the COMMIT, you get

   NOTICE:  error: cannot commit while a subtransaction is active

which happens because the EXCEPTION block creates a subtransaction, and
we can't commit when it's active.

But we can commit outside the exception block:

   CREATE OR REPLACE PROCEDURE test() LANGUAGE plpgsql AS $$
   DECLARE
  msg TEXT;
   BEGIN
 BEGIN
   INSERT INTO t VALUES (1);
 EXCEPTION
   WHEN others THEN
 msg := SUBSTR(SQLERRM, 1, 100);
 RAISE NOTICE 'error: %', msg;
  END;
  COMMIT;
   END; $$;


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Barman

2019-10-31 Thread Tomas Vondra


On Thu, Oct 31, 2019 at 05:29:34PM +, Daulat Ram wrote:

Hi All,

We have configured postgres 11.2 in streaming replication (primary &
Standby)  on docker and I am looking to initiate the Postgres backup
using barman. As I know there are few options for taking backup using
barman.

RSYNC backup
Incremental Backups
Streaming Backup with continuous WAL streaming
Centralized and Catalogued Backups

Which is the best option for backup using barman? So that we can keep
the database safe in case of disaster? I feel the Incremental Backups
are most useful to perform the PITR but I want to know the experts
suggestions.



You're mixing a number of topics, here. Firstly, all backups done by
barman are centralized and catalogued, that's pretty much one of the
main purposes of barman.

When it comes to backup methods, there are two basic methods. rsync and
postgres (which means pg_basebackup). This is about creating the initial
base backup. Both methods then can replicate WAL by either streaming or
archive_command.

So first you need to decide whether to use rsync and pg_basebackup,
where rsync allows advanced features like incremental backup, parallel
backup and deduplication.

Then you need to decide whether to use archive_command or streaming
(i.e. pg_receivexlog).

The "right" backup method very much depends on the size of your
database, activity, and so on. By default you should probably go with
the default option, described as "scenario 1" in the barman docs, i.e.
pg_basebackup (backup_method = postgres) and WAL streaming.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: How to import Apache parquet files?

2019-11-05 Thread Tomas Vondra


On Tue, Nov 05, 2019 at 04:21:45PM +0100, Softwarelimits wrote:

Hi Imre, thanks for the quick response - yes, I found that, but I was not
sure if it is already production ready - also I would like to use the data
with the timescale extension, that is why I need a full import.



Well, we're not in the position to decide if parquet_fdw is production
ready, that's something you need to ask author of the extension (and
then also judge yourself).

That being said, I think FDW is probably the best way to do this. It's
explicitly designed to work with foreign data, so using it to access
parquet files seems somewhat natural.

The alternative is probably transforming the data into COPY format, and
then load it into Postgres using COPY (either as a file, or stdin).

Which of these options is the right one depends on your requirements.
FDW is more convenient, but row-based and probably significantly less
efficient than COPY. So if you have a lot of these parquet files, I'd
probably use the COPY. But maybe the ability to query the parquet files
directly (with FDW) is useful for you.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BigSQL pgc alternative

2019-12-21 Thread Tomas Vondra


On Fri, Dec 20, 2019 at 12:15:06PM -0300, Samuel Teixeira Santos wrote:

Hi all.

BigSQL still allow to install Postgres and others resources as like a
portable install.

But today, it's only offer your tool (pgc) for newer Postgresql versions.

I would like to install as portable option because it's more easy to config
and use in my own user in my development station.

What you recommend to do to replace pgc tool as alternative to install
postgresql 10 and the respective postgis version as like portable option?



I'm not familiar with BigSQL, so I'm not sure what exactly you mean when
you say "portable option". Can you explain?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: jsonb_set() strictness considered harmful to data

2020-01-17 Thread Tomas Vondra


On Wed, Jan 08, 2020 at 05:24:05PM +1030, Andrew Dunstan wrote:

On Wed, Jan 8, 2020 at 7:08 AM Pavel Stehule  wrote:


Hi

po 6. 1. 2020 v 22:34 odesílatel Andrew Dunstan 
 napsal:



Updated version including docco and better error message.

cheers

andrew



I think so my objections are solved. I have small objection

+ errdetail("exception raised due to \"null_value_treatment := 
'raise_exception'\""),
+ errhint("to avoid, either change the null_value_treatment argument or ensure that 
an SQL NULL is not used")));

"null_value_treatment := 'raise_exception'\""

it use proprietary PostgreSQL syntax for named parameters. Better to use 
ANSI/SQL syntax

"null_value_treatment => 'raise_exception'\""

It is fixed in attached patch

source compilation without warnings,
compilation docs without warnings
check-world passed without any problems

I'll mark this patch as ready for commiter

Thank you for your work




Thanks for the review. I propose to commit this shortly.



Now that this was committed, I've updated the patch status accordingly.

Thanks!

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Avoiding out of date statistics / planner

2020-02-12 Thread Tomas Vondra


On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:

It may also be worth noting that it is possible to make autovacuum/analyze
more aggressive, perhaps only on the tables that see large changes in data
that might result in a statistics issue. If you could share a query,
explain analyze output, and pseudo code or at least description of what
sort of bulk operations are being done, then more insight could be offered.


Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: 1 Status of vertical clustered index - 2 Join using (fk_constraint) suggestion - 3 Status of pgsql's parser autonomization

2020-02-16 Thread Tomas Vondra

On Sun, Feb 16, 2020 at 10:38:29PM +0100, maxzor wrote:

Hello,

1. I was told that M$ SQLServer provides huge performance deltas over
PostgreSQL when dealing with index-unaligned queries :

create index i on t (a,b, c);
select * from t where b=... and c=...;

Perhaps index-only scans might help here, but that generally does not
work for "SELECT *" queries.

Columnar storage has been tried by various companies, CitusData,
EnterpriseDB, 2ndQuadrant, Fujitsu Enterprise Postgres. It has been
discussed quite a lot, last thread that I was able to find being in
2017, https://www.postgresql.org/message-id/CAJrrPGfaC7WC9NK6PTTy6YN-NN%2BhCy8xOLAh2doYhVg5d6HsAA%40mail.gmail.com
where Fujitsu's patch made it quite far.

What is the status on such a storage manager extension interface ?

I think you're looking for threads about zheap and (especially)
zedstore. Those are two "storage manager" implementations various people
are currently working on. Neither of those is likely to make it into
pg13, though :-(

2. What do you think of adding a new syntax : 'from t join t2 using
(fk_constraint)' ? And further graph algorithms to make automatic
joins ?
Both 'natural join' and 'using (column_name)' are useless when the
columns are not the same in source and destination.
Plus it is often the case that the fk_constraints are over numerous
columns, even though this is usually advised against. But when this
case happens there will be a significant writing speedup.

I'm not really sure what's the point / benefit here. Initially it seemed
you simply propose a syntax saying "do a join using the columns in the
FK constraint" but it's unclear to me how this implies any writing
speedup?

I have been bothered by this to the point that I developed a
graphical-query-builder plugin for pgModeler,
https://github.com/maxzor/plugins/tree/master/graphicalquerybuilder#automatic-join-mode
,

but I believe such a syntax would be much better in the core!

Hm, maybe.

3. What is the status of making the internal parser of PostgreSQL less
coupled to the core, and easier to cherry-pick from outside?
It would be great to incorporate it into companion projects :
pgAdmin4, pgModeler, pgFormatter...

I have no idea what you mean by "less coupled" here. What are the
requirements / use cases you're thinking about?

FWIW I think it's pretty bad idea to post questions about three very
different topics into a single pgsql-hackers thread. That'll just lead
to a lot of confusion.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: DB running out of memory issues after upgrade

2020-02-18 Thread Tomas Vondra


On Tue, Feb 18, 2020 at 05:46:28PM +, Nagaraj Raj wrote:

after upgrade Postgres to v9.6.11 from v9.6.9 DB running out of memory issues 
no world load has changed before and after upgrade. 

spec: RAM 16gb,4vCore
Any bug reported like this or suggestions on how to fix this issue? I 
appreciate the response..!! 



This bug report (in fact, we don't know if it's a bug, but OK) is
woefully incomplete :-(

The server log is mostly useless, unfortunately - it just says a bunch
of processes were killed (by OOM killer, most likely) so the server has
to restart. It tells us nothing about why the backends consumed so much
memory etc.

What would help us is knowing how much memory was the backend (killed by
OOM) consuming, which should be in dmesg.

And then MemoryContextStats output - you need to connect to a backend
consuming a lot of memory using gdb (before it gets killed) and do

 (gdb) p MemoryContextStats(TopMemoryContext)
 (gdb) q

and show us the output printed into server log. If it's a backend
running a query, it'd help knowing the execution plan.

It would also help knowing the non-default configuration, i.e. stuff
tweaked in postgresql.conf.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Check constraints do not seem to be working!!!

2020-11-11 Thread Tomas Vondra

On 11/11/20 10:06 AM, Jitendra Loyal wrote:
> Thanks Nikolay
> 
> I read that but is there a way to meet the above requirement. And I will
> like to add that IS NULL and IS NOT NULL should evaluate to true/false.
> These operators are made for this and should not be returning NULL.
> 

This has nothing to do with IS [NOT] NULL, it's the first part of the
expression (b = TRUE) causing trouble. Essentially, the constraint

(b = true) and (c is not null)

is evaluated in two steps. First we evaluate the two parts individually,
and for (null, true) the results would look like this:

(b = true) => null
(c is not null) => true

and then we combine those results using 'AND'

null AND true => null

which is considered as if the constraint matches. If you want to handle
NULL for the first expression, you may do this, for example:

(b it not null and b = true) and (c is not null)

Or something like that.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Extended statistics for correlated columns, row estimates when values are not in MCVs list

2020-12-07 Thread Tomas Vondra




On 12/4/20 11:53 PM, Michael Lewis wrote:
> On version 12.5, I have a query similar to the below where I am getting
> rather unfortunate row estimates and a sub-optimal index choice as a result.
> 
> SELECT
>   id
> FROM
>   messages
> WHERE
>   client_id = 1234
>   and site_id = 889977
>   and message_type_id in ( 59, 62, 102, 162 )
>   and sent_on > NOW() - INTERVAL '3 days';
> 
> I created extended statistics (all types) on client_id and site_id,
> analyzed the columns, and in pg_stats_ext.dependencies (4 = client_id, 7 =
> site_id) have:
> {"4 => 7": 0.002997, "7 => 4": 0.896230}
> 
> When I check row estimates like below, I get results that indicate the
> columns are expected to be independent still.
> explain SELECT id FROM messages; --889594304
> explain SELECT id FROM messages where client_id = 1234; --133439
> explain SELECT id FROM messages where site_id = 889977; --28800
> explain SELECT id FROM messages where client_id = 1234 and site_id =
> 889977; --4
> 

I was a bit confused about this initially, because why would it still be
estimated as independent clauses, with the extended stats defined? But I
think the explanation is fairly simple - the combination of values is
simply rare enough not to be included in the MCV list, so it probably
gets estimated using the "regular" logic as if independent. But we still
mark the clauses as estimated, so the functional dependencies are not
really considered.

(I wonder if the code in PG14 would work better, but that's of little
use of course.)

> However, I pick a client & site ID pair which show up in the MCVs list,
> then I get the same estimate when querying for that site_id with or without
> including the client_id. That is great.
> 

I think this is consistent with the above explanation - in this case the
MCV actually kicks in, significantly improving the estimate.

> Is it reasonable to expect that if the correlation between two columns is
> rather high, then the optimizer might figure the columns are not
> independent and perhaps would give less weight to the value derived from
> independent column statistics? With table statistics, it is possible to set
> a static value or ratio for something like ndistinct. Any chance for
> something similar on dependency someday?
> 
> Perhaps I am expecting too much or have a poor understanding of what
> extended statistics can or someday might do. I deal with under estimates
> from correlations between client_id and sites or other similar
> dependent objects a fair bit and am hopeful to resolve some of those
> planning problems with extended stats, without the maintenance overhead of
> migrating everything to a single client per database to get more specific
> statistics.
> 

What you might try is defining the statistics with only the functional
dependencies. That should consider the column-level correlation even
when the combination of values is not in the MCV. It might make the
"good" estimate worse, but that's an inherent trade-off.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Potential BRIN Index Corruption

2020-12-09 Thread Tomas Vondra

On 12/9/20 12:07 AM, Huan Ruan wrote:
> HI Alvaro
> 
> Unfortunately those records were written a while ago and we no longer keep
> their WAL logs. Thanks for your help anyway.
> 

Can you estimate when roughly the records were written? E.g. by using a
rough estimate of WAL or XIDs generated per day, or something like that.
Maybe there was some I/O issue in that time period and a couple writes
got lost, or something like that.

Are there any other corrupted indexes on the table?

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Is there a way to dump schema to files on disk and keep them in sync

2020-12-13 Thread Tomas Vondra

On 12/13/20 6:34 PM, Adrian Klaver wrote:
> On 12/12/20 6:58 PM, Tim Uckun wrote:
>> I want to dump my postgres schema to disk in neat directories like
>> pgadmin presents. Then I want to be able to edit the files and sync
>> changes to the database and ideally if changes were made in the
>> database to sync them back to the disk.
>>
>> Is there a tool that does this? Is there a tool that will dump the
>> schema into separate directories and files like pgadmin does?
> 
> pgAdmin does not create directories, it just organizes the contents of
> the system catalogs into GUI elements.
> 
> For schema management I would suggest a tool like the one I use
> Sqitch(https://sqitch.org/). It will organize the process of schema
> creation and management.
> 

Yeah, that was my thought too. Or maybe look at the other schema
versioning tools available - we have a list on the wiki:

https://wiki.postgresql.org/wiki/Change_management_tools_and_techniques

I'm sure it's incomplete, but it's helpful nevertheless.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Reg: User creation script/List of privileges

2022-03-30 Thread Tomas Pospisek


On 30.03.22 05:09, Sai Ch wrote:

Hi Experts,

I am looking for a query or possibility to generate all the privileges a 
user/role has.


I need this to create a user/role from one database to another with 
exact same privileges.


Kindly, share the query or way to proceed further.

Thanks & Regards,


Maybe have a look at 
https://github.com/tpo/little_shell_scripts/blob/master/psql_access_priv_decoder

ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-22 Thread Tomas Pospisek


Hi all,

while doing `cat pg_dump.dump | psql` I get the above message. Note that 
`pg_dump.dump` contains:


CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING = 
'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';


What is exactly the problem? I understand that en_US.UTF-8 and 
en_US.utf-8 is not *exactly* the same string.


However I do not understand how the difference came to be. And I do not 
know what the "right" way is and how to proceed from here.


If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) I 
get:


CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';

When I do the same on the old server (12.8-1.pgdg20.04+1) I get:

CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
LC_CTYPE = 'en_US.UTF-8';


Any hints or help?
*t

Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-22 Thread Tomas Pospisek


On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:

Hi all,

while doing `cat pg_dump.dump | psql` I get the above message. Note 
that `pg_dump.dump` contains:


 CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING 
= 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';


What is exactly the problem? I understand that en_US.UTF-8 and 
en_US.utf-8 is not *exactly* the same string.


However I do not understand how the difference came to be. And I do 
not know what the "right" way is and how to proceed from here.


If I `pg_dump --create` some DB on the new server (13.7-1.pgdg18.04+1) 
I get:


 CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';

When I do the same on the old server (12.8-1.pgdg20.04+1) I get:

 CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
LC_CTYPE = 'en_US.UTF-8';


Any hints or help?


Are dumping/restoring from one version of Postgres to another?


Yes, indeed!

If from older to newer then use the new version of pg_dump(13) to dump 
the older(12) database. Then the 13 version of restore to load the 
version 13 database.


I will. Thanks a lot Adrian!
*t

Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-23 Thread Tomas Pospisek

On 22.06.22 22:18, Tomas Pospisek wrote:

On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:

Hi all,

while doing `cat pg_dump.dump | psql` I get the above message. Note 
that `pg_dump.dump` contains:

 CREATE DATABASE some_db WITH TEMPLATE = my_own_template ENCODING 
= 'UTF8' LC_COLLATE = 'en_US.UTF-8' LC_CTYPE = 'en_US.UTF-8';

What is exactly the problem? I understand that en_US.UTF-8 and 
en_US.utf-8 is not *exactly* the same string.

However I do not understand how the difference came to be. And I do 
not know what the "right" way is and how to proceed from here.

If I `pg_dump --create` some DB on the new server 
(13.7-1.pgdg18.04+1) I get:

 CREATE DATABASE ... ENCODING = 'UTF8' LOCALE = 'en_US.utf-8';

When I do the same on the old server (12.8-1.pgdg20.04+1) I get:

 CREATE DATABASE ... ENCODING = 'UTF8' LC_COLLATE = 'en_US.UTF-8' 
LC_CTYPE = 'en_US.UTF-8';

Any hints or help?

Are dumping/restoring from one version of Postgres to another?

Yes, indeed!

If from older to newer then use the new version of pg_dump(13) to dump 
the older(12) database. Then the 13 version of restore to load the 
version 13 database.

I will. Thanks a lot Adrian!

So I used both pg_dump and pg_restore from the newer machine. Result is 
still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:

Tom Lane wrote:

> This is probably more about dumping from different operating systems.
> The spelling of the locale name is under the control of the OS,
> and Postgres doesn't know very much about the semantics of it
> (so I think we conservatively assume that any difference in
> spelling is significant).
>
> Best bet might be to edit the dump file to adjust the locale
> spellings to match your new system.

Many thanks to both Tom & Adrian!!!
*t

Re: ERROR: new collation (en_US.UTF-8) is incompatible with the collation of the template database (en_US.utf-8)

2022-06-24 Thread Tomas Pospisek


On 23.06.22 20:21, Adrian Klaver wrote:

On 6/23/22 10:11, Adrian Klaver wrote:

On 6/23/22 00:37, Tomas Pospisek wrote:

On 22.06.22 22:18, Tomas Pospisek wrote:

On 22.06.22 21:25, Adrian Klaver wrote:

On 6/22/22 12:17, Tomas Pospisek wrote:




So I used both pg_dump and pg_restore from the newer machine. Result 
is still the same.  So I'll use Tom Lane's suggestion too and fix the 
'UTF-8' spelling in the dump file:


Not sure why that is necessary? Is seems this is low hanging fruit 
that could dealt with by the equivalent of lower('en_US.UTF-8') = 
lower('en_US.utf-8').


Well that was clear as mud.


:-D

My point was that I don't see why the end 
user should have to do this when it could be handled internally in the 
pg_restore code.


That would indeed be very friendly of pg_restore if it'd take that 
little task off the user :-)


+1 of course :-)

*t

Re: PostgreSQL 14.4 ERROR: out of memory issues

2022-07-14 Thread Tomas Vondra




On 7/14/22 21:25, Aleš Zelený wrote:
> 
> st 13. 7. 2022 v 2:20 odesílatel Michael Paquier  <mailto:mich...@paquier.xyz>> napsal:
> 
> On Mon, Jul 11, 2022 at 10:50:23AM +0200, Aleš Zelený wrote:
> > So far, it has happened three times (during a single week) from
> the 14.3 ->
> > 14.4 upgrade, before 14.4 we haven't suffered from such an issue.
> >
> > Questions:
> > 1)  Can we safely downgrade from 14.4 to 14.3 by shutting down the
> instance
> > and reinstalling 14.3 PG packages (to prove, that the issue
> disappear)?
> > 2) What is the best way to diagnose what is the root cause?
> 
> Hmm.  14.4 has nothing in its release notes that would point to a
> change in the vacuum or autovacuum's code paths:
> https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4
> <https://www.postgresql.org/docs/14/release-14-4.html#id-1.11.6.5.4>
> 
> There is nothing specific after a look at the changes as of, and I am
> not grabbing anything that would imply a change in memory context
> handling either:
> `git log --stat REL_14_3..REL_14_4`
> `git diff REL_14_3..REL_14_4 -- *.c`
> 
> Saying that, you should be able to downgrade safely as there are no
> changes in WAL format or such that would break things.  Saying that,
> the corruption issue caused by CONCURRENTLY is something you'd still
> have to face.
> 
> 
> Thanks, good to know that, we can use it for a test case, since we
> already hit the CONCURRENTLY bug on 14.3.
> 
> > 2022-07-02 14:48:07 CEST [3930]: [3-1] user=,db=,host=,app=
> ERROR:  out of
> > memory
> > 2022-07-02 14:48:07 CEST [3930]: [4-1] user=,db=,host=,app=
> DETAIL:  Failed
> > on request of size 152094068 in memory context
> "TopTransactionContext".
> > 2022-07-02 14:48:07 CEST [3930]: [5-1] user=,db=,host=,app= CONTEXT:
> >  automatic vacuum of table "prematch.replication.tab_queue_tmp"
> 
> This is the interesting part.  Do you happen to use logical
> replication in a custom C++ plugin?
> 
> 
> We are using logical replication to other instances (pg_output) and
> decoderbufs
> https://github.com/debezium/postgres-decoderbufs
> <https://github.com/debezium/postgres-decoderbufs> for other applications.
>  

This is probably just a red herring - std:bad_alloc is what the process
that runs into the overcommit limit gets. But the real issue (e.g.
memory leak) is likely somewhere else - different part of the code,
different process ...

> ...
> 
> Checking the RssAnon from proc/pid/status I've found some points where
> RssAnon memory usage grew very steep for a minute, but no "suspicious"
> queries/arguments were found in the instance logfile.
>  
> Any hint, on how to get the root cause would be appreciated since so far
> I've failed to isolate the issue reproducible testcase.
> At least I hope that looking for the RssAnon process memory is an
> appropriate metric, if not, let me know and I'll try to update the
> monitoring to get the root cause.
> 
> I can imagine a workaround with client application regular reconnect...,
> but u to 14.3 it works, so I'd like to fix the issue either on our
> application side or at PG side if it is a PG problem.
> 

I think it's be interesting to get memory context stats from the
processes consuming a lot of memory. If you know which processes are
suspect (and it seems you know, bacause if a reconnect helps it's the
backend handling the connection), you can attach a debugger and do

   $ gdb -p $PID
   call MemoryContextStats(TopMemoryContext)

which will log info about memory contexts, just like autovacuum.
Hopefully that tells us memory context is bloated, and that might point
us to particular part of the code.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra




On 8/5/22 11:50, Danny Shemesh wrote:
> Hey all !
> 
> I'm on a quest to help the planner (on pg14) use the best of several
> partial, expressional indices we have on some large tables (few TBs in
> size, billions of records).
> 
> As we know, stats for expressions in partial indices aren't gathered by
> default - so I'm tinkering with expressional extended stats to cover for
> those.
> 
> I've tackled two interesting points there:
> 1. Seems like expressional stats involving the equality operator are
> skipped or mismatched (fiddle
> <https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/5379>)
> Let's take the following naive example:
> /create table t1 (x integer[]);
> insert into t1 select array[1]::integer[] from generate_series(1,
> 10, 1);
> create statistics s1 on (x[1] = 1) from t1;
> analyze t1;
> /
> /explain analyze select * from t1 where x[1] = 1;/
> /> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual
> time=0.009..36.035 rows=10 loops=1)/
> /
> /
> Now, of course one can just create the stat on x[1] directly in this
> case, but I have a more complex use case where an equality operator is
> beneficial; 
> should the above case be supported ? feels like I'm just missing
> something fundamental.
> 

Hmmm. The problem here is that the expression may be interpreted either
as an operator clause (A=B), or as a boolean expression ((A=B)=TRUE). In
principle we might check which option matches extended statistics, but
that's not for free :-(

So what the current code does is trying to match the more specific cases
first, leaving the "bool expression" as a last resort. That seems like a
reasonable trade off to me, but I'll think about this a bit more.

There are probably other ways to match expressions, and we can't
possibly explore all of them. For example you may create statistics on
(x=1 AND y=2) and I doubt we'll match that, because we'll try matching
individual clauses not some arbitrary combinations of clauses. (Maybe we
shouldn't even allow creating such statistics ...)

> 2. Less important, just a minor note - feel free to ignore - although
> the eq. operator above seems to be skipped when matching the ext. stats,
> I can work around this by using a CASE expression (fiddle
> <https://www.db-fiddle.com/f/wJZNH1rNwJSo3D5aByQiWX/1>);
> Building on the above example, we can:
> /create statistics s2 on (case x[1] when 1 then true else false end)
> from t1;/
> /explain analyze select * from t1 where (case x[1] when 1 then true else
> false end/
> />  Seq Scan on t1 (cost=0.00..1986.00 rows=10 width=25) (actual
> time=0.011..33.721 rows=10 loops=1)/
> /

Yes, because this end ups not being matches as opclause, and therefore
goes all the way to the boolvarsel() in clause_selectivity_ext.

> /
> What's a bit problematic here, though, is that if we mix other dependent
> columns to the extended stat, and specifically if we create an mcv, 
> queries involving the CASE expression throw with `error: unknown clause
> type 130`, where clause type == T_CaseExpr.
> 
> The second point for me would be that I've found it a bit non intuitive
> that creating an extended statistic can fail queries at query time; it
> makes sense that the mcv wouldn't work for case expressions, but it
> might've been a bit clearer to:
> 
> a. Fail this at statistic creation time, potentially, or 
> b. Convert the type numeric in the above error to its text
> representation, if we can extract it out at runtime somehow - 
> I couldn't find a mapping of clause type numerics to their names, and as
> the node tags are generated at compile time, it could be build-dependent
> and a bit hard to track down if one doesn't control the build flags
> 

Yeah, this seems like a clear bug - we should not fail queries like
this. It's a sign statext_is_compatible_clause() and the MCV code
disagrees which clauses are compatible.

Can you share an example triggering this?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Expr. extended stats are skipped with equality operator

2022-08-07 Thread Tomas Vondra

On 8/7/22 19:28, Tom Lane wrote:
> Tomas Vondra  writes:
>> Yeah, this seems like a clear bug - we should not fail queries like
>> this. It's a sign statext_is_compatible_clause() and the MCV code
>> disagrees which clauses are compatible.
> 
> Indeed.  I attempted to fix that at e33ae53dd.
> 

Thanks! That probably explains why I've been unable to reproduce that, I
haven't realized there's a fix already.



-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

can't get psql authentication against Active Directory working

2023-02-18 Thread Tomas Pospisek


Hello all,

so I'm trying to authenticate psql (on Windows) -> postgres (on Linux) 
via Active Directory.


psql (Linux) -> postgres (Linux) with authentication against Active 
Directory does work.


However the same with psql.exe on Windows does not. I get:

D:\>C:\OSGeo4W\bin\psql.exe service=the_db
psql: error: connection to server at "dbserver.example.lan
(192.168.4.104), port 5432 failed: could not initiate GSSAPI
security context: No credentials were supplied, or the credentials
were unavailable or inaccessible: Internal credentials cache error

psql.exe from the OSGeo4W QGIS Installer *does* include GSS support. (I 
have tried with a different psql.exe without GSS support and it would 
tell me that it does not support GSS).


The .pg_service.conf file in the users $HOME directory looks like this:

[the_db]
host=dbserver.example.lan
port=5432
user=u...@example.lan
gssencmode=require

This same pg_service.conf does work for psql (Linux).

Also getting a Kerberos ticket for the service on Windows does work:

D:\> klist get postgres/dbserver.example@example.lan
[...]
This will list the ticket

But when using psql.exe it will not get a ticket for the service nor 
will it apparently use the existing service ticket.


I have tried to trace psql.exe with Window's Process Monitor and I can't 
see it accessing no keytab file (I'm not sure whether a keytab file 
exists at all under Windows or if psql.exe doesn't instead need to 
access some Windows service). I see that psql.exe will open and close a 
TCP connection to dbserver.example.lan, however as far as I can see that 
connection is completely irrelevant for the aquisition of a Kerberos 
ticket for the service since that is a business purely between psql.exe 
and Active Directory or respectively between psql.exe and the 
credentials cache. And there is no other TCP connection being opened to 
anywhere from psql.exe.


What I find suspicious about the error above is "... the credentials 
were unavailable or inaccessible: Internal credentials cache error", 
since that looks like either psql.exe can't access the (inexisting) 
keytab file, or it can't access Window's Kerberos service.


Also, I see that psql.exe is trying to access a ccapiserver.exe which 
does not exist. Should psql.exe be able to access that ccapiserver.exe 
file? That means is the OSGeo4W QGIS installer, that also installs all 
things necessary for psql missing that ccapiserver.exe executable?


* has anybody ever succeeded in authenticating with psql.exe against 
Active Directory?

* can you maybe tell me what's wrong from the error message above?
* how can I proceed from here?

Thanks a lot for any pointers and/or help!!!
*t

PS: Any way to make GSS more talkative? At this moment all that I can 
get as logs is the above "computer says no".

Re: can't get psql authentication against Active Directory working

2023-02-18 Thread Tomas Pospisek


On 18.02.23 17:16, Erik Wienhold wrote:

On 18/02/2023 15:02 CET Tomas Pospisek  wrote:

so I'm trying to authenticate psql (on Windows) -> postgres (on Linux)
via Active Directory.

psql (Linux) -> postgres (Linux) with authentication against Active
Directory does work.

However the same with psql.exe on Windows does not. I get:

  D:\>C:\OSGeo4W\bin\psql.exe service=the_db
  psql: error: connection to server at "dbserver.example.lan
  (192.168.4.104), port 5432 failed: could not initiate GSSAPI
  security context: No credentials were supplied, or the credentials
  were unavailable or inaccessible: Internal credentials cache error

psql.exe from the OSGeo4W QGIS Installer *does* include GSS support. (I
have tried with a different psql.exe without GSS support and it would
tell me that it does not support GSS).

The .pg_service.conf file in the users $HOME directory looks like this:

  [the_db]
  host=dbserver.example.lan
  port=5432
  user=u...@example.lan
  gssencmode=require

This same pg_service.conf does work for psql (Linux).


On Windows the service file is not read from $home/.pg_service.conf but
$env:appdata/postgresql/.pg_service.conf (or 
%appdata%/postgresql/.pg_service.conf
when using cmd.exe.)


Thank you Erik,

the location of the .pg_service.conf file is not the problem, as I am 
seeing that psql.exe is reacting to its contents, that is authenticating 
differently depending on whether gssencmode is set or not (maybe I noted 
its location wrongly, since i've written the post from my notes, as I do 
not have access to the machines in question all the time).


Do you have authentication from psql.exe against Active Directory working?
*t

Re: can't get psql authentication against Active Directory working

2023-02-21 Thread Tomas Pospisek


Hi Stephen,

first: thanks a lot for replying!

On 20.02.23 15:17, Stephen Frost wrote:


* Tomas Pospisek (t...@sourcepole.ch) wrote:

so I'm trying to authenticate psql (on Windows) -> postgres (on Linux) via
Active Directory.


Looks like you're trying to do a bit more than that as you're using
GSSAPI (not SSPI, which would be the more typical method on Windows..)
and you're asking for an encrypted connection.  Is there a reason you're
going down this particular route..?


The reason I'm using the GSSAPI is it's the only documented way to do 
authentication from psql.exe against Active Directory that was able to 
find. Even now I am unable to find *any* documentation on how to go 
about doing auth from psql.exe to AD with SSPI. Would you have any 
pointers to documentation or a howto (or a blog post or a stackoverflow 
answer or...).


The reason I am apparently asking for encrypted communications is that I 
apparently misinterpreted what `gssencmode`: I thought that it would 
force the use of GSS but apparently it's forcing a connection encryption?


How do I force psql.exe to use GSS and/or SSPI then (instead of 
username/password authentication that it seems to be doing by default).


Maybe the reason I'm using GSS is because what I really want to achieve 
is to authenticate from QGIS with AD. QGIS (from osgeo4w) is shipping a 
libpq that is linked against the kerberos library, so that's what I 
have. I have no idea how libpq needs to be compiled so as to be able to 
do SSPI?


Also, just in case I would find out how to get my hand (or build) a 
psql/libpsql that *can* do SSPI - would I be able to swap out the 
existing libpq for the libpq with SSPI support and dependend executable 
would still be supposed to work?


Or am I completely misunderstanding everything?


psql (Linux) -> postgres (Linux) with authentication against Active
Directory does work.


That's good.  One thing to know about Linux vs. Windows when it comes to
Kerberos (and GSSAPI/SSPI) is that the default service name is different
between them.  See the libpq parameter krbsrvname documentation here:

https://www.postgresql.org/docs/current/libpq-connect.html


I quote that document:

"krbsrvname

Kerberos service name to use when authenticating with GSSAPI. This 
must match the service name specified in the server configuration for 
Kerberos authentication to succeed. (See also Section 21.6.) The default 
value is normally postgres, but that can be changed when building 
PostgreSQL via the --with-krb-srvnam option of configure. In most 
environments, this parameter never needs to be changed. Some Kerberos 
implementations might require a different service name, such as 
Microsoft Active Directory which requires the service name to be in 
upper case (POSTGRES).

"

I'm using the postgres server from apt.postgresql.org. I do not know 
whether it has been built with `--with-krb-srvnam` but as far as I can 
find out it was not.


Could you please give me a working example? Assuming I need to set 
`krbsrvname=POSTGRES` then where would I have to set that? In 
pg_service.conf? Or in krb5.conf? Or somewhere else?



However the same with psql.exe on Windows does not. I get:

 D:\>C:\OSGeo4W\bin\psql.exe service=the_db
 psql: error: connection to server at "dbserver.example.lan
 (192.168.4.104), port 5432 failed: could not initiate GSSAPI
 security context: No credentials were supplied, or the credentials
 were unavailable or inaccessible: Internal credentials cache error


That's a bit of an odd message to get if krbsrvname is this issue
though.


psql.exe from the OSGeo4W QGIS Installer *does* include GSS support. (I have
tried with a different psql.exe without GSS support and it would tell me
that it does not support GSS).


Providing the specific messages would possibly be helpful here..


The message I get when libpq is not compiled against GSS is:

gssencmode value "require" invalid when GSSAPI support is not 
compiled in



The .pg_service.conf file in the users $HOME directory looks like this:

 [the_db]
 host=dbserver.example.lan
 port=5432
 user=u...@example.lan
 gssencmode=require

This same pg_service.conf does work for psql (Linux).

Also getting a Kerberos ticket for the service on Windows does work:

 D:\> klist get postgres/dbserver.example@example.lan
 [...]
 This will list the ticket


Is this using MIT klist or Windows klist though?


It's using Windows klist.


But when using psql.exe it will not get a ticket for the service nor will it
apparently use the existing service ticket.


As mentioned above, GSSAPI and SSPI aren't the same thing and what I
suspect you're seeing here is that the Windows klist is using SSPI but
the psql you have is built with GSSAPI.  There used to be a Kerberos For
Windows utility that would make the bridge b

Re: can't get psql authentication against Active Directory working

2023-02-24 Thread Tomas Pospisek

Hi Stephen,

again thanks a lot for trying to help me!

On 21.02.23 16:29, Stephen Frost wrote:

* Tomas Pospisek (t...@sourcepole.ch) wrote:

On 20.02.23 15:17, Stephen Frost wrote:

* Tomas Pospisek (t...@sourcepole.ch) wrote:

so I'm trying to authenticate psql (on Windows) -> postgres (on Linux) via
Active Directory.

Looks like you're trying to do a bit more than that as you're using
GSSAPI (not SSPI, which would be the more typical method on Windows..)
and you're asking for an encrypted connection. Is there a reason you're
going down this particular route..?

The reason I'm using the GSSAPI is it's the only documented way to do
authentication from psql.exe against Active Directory that was able to find.

I mean ... https://www.postgresql.org/docs/current/sspi-auth.html

Were should the settings that are discussed in that document be
configured? In pg_hba.conf?

That document says:

"SSPI authentication only works when both server and client are running
Windows, or, on non-Windows platforms, when GSSAPI is available."

I interpret that phrase like this:

* there's a case where both server and client are running Windows
(doesn't apply to me)
* there's a case where both are running non-Windows (doesn't apply to me
either - server is Linux, client is Windows)

So that's an extremely terse docu that is not clear at all to me.

I'd suggest to change that phrase into:

"SSPI authentication works when both server and client are running
Windows. When server or client are on non-Windows platforms then those
need to use GSSAPI."

- assuming that my interpretation of that super terse docu is correct of
course.

Even now I am unable to find *any* documentation on how to go about doing
auth from psql.exe to AD with SSPI. Would you have any pointers to
documentation or a howto (or a blog post or a stackoverflow answer or...).

Sure, here's a blog post that I wrote about doing exactly that:

https://www.crunchydata.com/blog/windows-active-directory-postgresql-gssapi-kerberos-authentication

Note that that document doesn't mention SSPI a single time.

The ultra terse Postgres docu is using terms such as "SAM", "negotiate
mode", "UPN", without a reference to their definition, which leaves to
Windows noobs like me (and equaly Windows non-noobs) without a clue.
Searching "kerberos SAM" or "SSPI negotiate mode" on duckduckgo doesn't
seem to lead to useful hits. So the documentation seems to be geared to
experts of both Kerberos and Active Directory and Windows technology
alike which as far as I can tell is a very rare specimen.

The reason I am apparently asking for encrypted communications is that I
apparently misinterpreted what `gssencmode`: I thought that it would force
the use of GSS but apparently it's forcing a connection encryption?

gssencmode indeed requires a GSS encrypted connection. Unfortunately,
there isn't a way to force GSS or SSPI be used for authentication today
(outside of GSS with encryption) from the client side, though that's
something that's been discussed and I think most people feel would be
good to have. That said, if the server is configured in pg_hba for gss
(or sspi), then the client will try to authenticate that way.

As far as I can see it is *not possible* to configure the server for
SSPI ("or sspi")? It's only possible to configure it for GSS *and* SSPI
at the same time (?) assuming the server is running on a non-Windows
platform?

How do I force psql.exe to use GSS and/or SSPI then (instead of
username/password authentication that it seems to be doing by default).

The server's hba needs to be configured for gss (if the server is Linux)
or sspi (if the server is on Windows) and then the client will be
informed that's what the server wants when it connects and it'll try to
authenticate that way. It's not possible currently to force the client
to only try GSS or SSPI, what it will try is mostly driven by the server
side saying what the server wants the client to authenticate with.

Maybe the reason I'm using GSS is because what I really want to achieve is
to authenticate from QGIS with AD. QGIS (from osgeo4w) is shipping a libpq
that is linked against the kerberos library, so that's what I have. I have
no idea how libpq needs to be compiled so as to be able to do SSPI?

PostgreSQL, when built on Windows (including libpq), is always built
with SSPI support. That said, if it's also built with GSSAPI, then SSPI
is preferred but you can set 'gsslib' to 'gssapi' to prefer gssapi be
used instead. Now, if you're setting gssencmode to prefer or require
and the library is built with GSSAPI then we're going to try GSSAPI
first to try to set up the encrypted connection.

Also, just in case I would find ou

Re: can't get psql authentication against Active Directory working

2023-03-10 Thread Tomas Pospisek

Hi Stephen,

On 25.02.23 00:52, Stephen Frost wrote:

* Tomas Pospisek (t...@sourcepole.ch) wrote:

On 21.02.23 16:29, Stephen Frost wrote:

* Tomas Pospisek (t...@sourcepole.ch) wrote:

On 20.02.23 15:17, Stephen Frost wrote:

* Tomas Pospisek (t...@sourcepole.ch) wrote:

That [SSPI] document says:

"SSPI authentication only works when both server and client are running
Windows, or, on non-Windows platforms, when GSSAPI is available."

I interpret that phrase like this:

* there's a case where both server and client are running Windows (doesn't
apply to me)
* there's a case where both are running non-Windows (doesn't apply to me
either - server is Linux, client is Windows)

So that's an extremely terse docu that is not clear at all to me.

I'd suggest to change that phrase into:

"SSPI authentication works when both server and client are running Windows.
When server or client are on non-Windows platforms then those need to use
GSSAPI."

- assuming that my interpretation of that super terse docu is correct of
course.

No, that's not correct. The blog post that I referenced is actually
using SSPI on the client and GSS on the server and it works because
they're compatible with each other on the wire. Perhaps we could
improve the documentation you reference above but it's not actually
wrong as-is. Perhaps this would be clearer:

SSPI and GSS are wireline compatible and can be mixed and matched
between clients and servers (where support for GSS is built into the
client library or the server), provided that there is an encryption
method which both will accept.

The encryption method bit is more of a historical artifact at this
point as modern systems have compatible AES-based encryption methods,
but that wasn't always the case. It's also possible that it'll become
an issue in the future but at least for the moment most installations
have an AES-based compatible encryption method.

I have posted a suggestion for an improvement (via the form) to the
pgsql-docs mailing list. Lets see what comes out of that.

I like the idea to document the *why* behind SSPI and GSS, however I
don't understand enough about it to propose a documentation improvement.

Sure, here's a blog post that I wrote about doing exactly that:

https://www.crunchydata.com/blog/windows-active-directory-postgresql-gssapi-kerberos-authentication

Note that that document doesn't mention SSPI a single time.

It doesn't actually need to because SSPI is the default and it all just
works..

I can confirm the "just works" part since my setup now indeed *does* work.

However the existing documentation on GSS/Kerberos/SSPI/Active directory
is extremely terse and sparse. So being more explicit and being clearer
would improve the situation I think. Are you able and interested to
change the Blog article?

I'd suggest to add a paragraph like this:

Using a Postgresql client on Windows

If you use a Postgresql client on Windows then that client should
be able to automatically use Windows' SSPI API, which will do the
authentication via Active Directory without any further
configuration necessary.

I suppose I could have added in somewhere "SSPI is used on the
client for this", perhaps, but the blog post was more focused on "this
is what you do to make it work" and you don't really need to know that
the API that the client is using in this case happens to be called SSPI.

I'm certainly all for improving the documentation. Not sure that just
spelling those out would really be enough and I'm rather confident that
trying to explain all of how Kerberos and AD work wouldn't really be
maintainable in our documentation. Still, specific suggestion on how we
can improve would be great.

+1 - see my post to the pgsql-docs mailing list for a starter.

Searching "kerberos SAM" or "SSPI negotiate mode" on duckduckgo doesn't seem
to lead to useful hits. So the documentation seems to be geared to experts
of both Kerberos and Active Directory and Windows technology alike which as
far as I can tell is a very rare specimen.

Active Directory experts certainly should be generally familiar with
Kerberos (and the terms you reference above are actually all AD ones,
really..), so I'd argue that the documentation is mainly focused on
helping those individuals. I do generally

Re: Getting error while upgrading postgres from version 12 to 13

2023-03-18 Thread Tomas Pospisek


On 18.03.23 22:24, shashidhar Reddy wrote:
and adding to this the following is also showing in the error, we tried 
changing the file permissions but still no luck
FATAL:  could not load private key file 
"/etc/ssl/private/ssl-cert-snakeoil.key": key values mismatch


This sounds like the private SSL key you have configured is not the 
private key for the public key you have configured - they do not match.

*t

Re: Getting error while upgrading postgres from version 12 to 13

2023-03-18 Thread Tomas Pospisek

On Sun, Mar 19, 2023 at 2:09 AM shashidhar Reddy 
mailto:shashidharreddy...@gmail.com>> wrote:


Hello,

I am in a process of upgrading postgresql from version 12 to 13
using pg_upgrdrade but I am getting error stating

  FATAL:  database files are incompatible with server
  DETAIL:  The data directory was initialized by PostgreSQL version 13
, which is not compatible with this version 12.14 (Ubuntu
12.14-1.pgdg18.04+1)

Please help me on this issue.


You have configured the postgres v12 server to use a directory where 
there are data from a postgres v13 server. Evidently that can not work...

*t

Re: Properly handle OOM death?

2023-03-18 Thread Tomas Pospisek


On 13.03.23 21:25, Joe Conway wrote:

Hmm, well big +1 for having swap turned on, but I recommend setting 
"vm.overcommit_memory=2" even so.


I've snipped out the context here, since my advice is very unspecific: 
do use swap only as a safety net. Once your system starts swapping 
performance goes down the toilet.

*t

Re: backend crash on DELETE, reproducible locally

2018-11-06 Thread Tomas Vondra

On 11/6/18 10:54 PM, Andres Freund wrote:
> On 2018-11-06 16:47:20 -0500, Tom Lane wrote:
>> =?UTF-8?Q?Ond=c5=99ej_Bouda?=  writes:
>>>> Ondřej, as a short-term workaround you could prevent the crash
>>>> by setting that index's recheck_on_update property to false.
>>
>>> Thanks for the tip. I am unsuccessful using it, though:
>>> # ALTER INDEX public.schedulecard_overlap_idx SET (recheck_on_update = 
>>> FALSE);
>>> ERROR:  unrecognized parameter "recheck_on_update"
>>
>> Oh, for crying out loud.  That's yet a different bug.
>> I'm not sure that it's the fault of the recheck_on_update
>> feature proper though; it might be a pre-existing bug in
>> the reloptions code.  Looks like somebody forgot to list
>> RELOPT_KIND_GIST in RELOPT_KIND_INDEX, but is that the
>> fault of commit c203d6cf8 or was it busted before?
> 
> Looks new:
> +   RELOPT_KIND_INDEX = 
> RELOPT_KIND_BTREE|RELOPT_KIND_HASH|RELOPT_KIND_GIN|RELOPT_KIND_SPGIST,
> 
> there aren't any other "for all indexes" type options, so the whole
> category didn't exist before.
> 
> It also strikes me as a really bad idea, even if RELOPT_KIND_GIST
> wouldn't have been omitted: It breaks index am extensibility.
> 

Does it? The RELOPT_KIND_* stuff is hard-coded in reloptions.h anyway,
so I'm not sure how this particular thing makes it less extensible?

That being said, we also have RELOPT_KIND_BRIN, and that seems to be
missing from RELOPT_KIND_INDEX too (and AFAICS the optimization works
for all index types).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: integrate Postgres Users Authentication with our own LDAP Server

2019-05-09 Thread Tomas Vondra


On Thu, May 09, 2019 at 07:11:24AM +, M Tarkeshwar Rao wrote:

We  want to setup ldap authentication in pg_hba.conf, for Postgresql
users(other than  postgres super user).

We are getting issue with special characters by following steps given in
postgres documentation.  It is not accepting any special characters as
special characters are mandatory in our use case.

Can you please help us or have you any steps by which we can configure
any postgres with LDAP?


Please don't cross-post - this is a fairly generic question, it has
nothing to do with performance or development, so the right thing is to
send it to pgsql-general. Likewise, it makes little sense to send
questions to the "owner". I've removed the other lists from CC.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Regarding SSL Enablement in PostgreSQL Database on different port

2023-05-02 Thread Tomas Pospisek


On 02.05.23 12:13, Magnus Hagander wrote:

On Tue, May 2, 2023 at 11:43 AM sujay kadam  wrote:


Hi PostgreSQL Team,


I want to enable SSL in PostgreSQL Database on a new port.

I don’t want the default port that is 5432 SSL enabled, but I want to configure 
another port to enable SSL on it.

As per my requirement, I cannot use the same port for normal connection and SSL 
connection.

Hence, we require a new port to be SSL enabled.


Please guide us with proper information and links to achieve the above task.


That is now how SSL in PostgreSQL works. It will always run on the
same port, and PostgreSQL will only listen on one port.

You can probably do some hacky solution to it by running something
like pgbouncer on a different port and enable SSL only in that one.
But it will be a hack. I would recommend instead reviewing your
requirements and see if you can make them work with how PostgreSQL is
designed.


Oh, I think your idea to use pgbouncer to take care of the SSL 
termination is elegant. I don't think me I'd characterize it as a hack 
if properly set up. Why do you consider it a hack?

*t

Re: Question regarding specifics of GIN and pg_trgm performance and potential use of show_trgm to improve it

2023-05-25 Thread Tomas Vondra

2273.256 ms
> (13 rows)
> 
> 
> Thank you for any sort of insight into this.

I don't think indexing the number of trigrams like this can help, and
I'm not sure how to improve this (at least for the built-in GIN). It
seem similarity searches are bound to be proportional to the most
frequent trigram in the query.

I wonder if the "newer" GIN variants like RUM [1] could improve this,
but I don't think it has trgm opclasses.


regards

[1] https://github.com/postgrespro/rum

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pb with join plan

2023-06-21 Thread Tomas Vondra

On 6/21/23 00:26, Marc Millas wrote:
> 
> 
> On Tue, Jun 20, 2023 at 11:19 PM David Rowley  <mailto:dgrowle...@gmail.com>> wrote:
> 
> On Wed, 21 Jun 2023 at 08:34, Marc Millas  <mailto:marc.mil...@mokadb.com>> wrote:
> >
> > On Tue, Jun 20, 2023 at 10:14 PM David Rowley
> mailto:dgrowle...@gmail.com>> wrote:
> >>
> >> On Wed, 21 Jun 2023 at 07:42, Marc Millas  <mailto:marc.mil...@mokadb.com>> wrote:
> >> > But if I do the same with clause one OR clause 2, I have to 
> kill the request after an hour, seeing the filesystem showing more
> than 140 Mb of increased usage.
> >>
> >>
> > link to the anonymized plan of the req with one clause :
> https://explain.depesz.com/s/TWp4 <https://explain.depesz.com/s/TWp4>
> 
> link to the plan with the second
> clause alone: https://explain.depesz.com/s/byW5
> <https://explain.depesz.com/s/byW5> 
> link to the plan with both clauses ORed (the one not
> finishing) https://explain.depesz.com/s/jHO2
> <https://explain.depesz.com/s/jHO2>
> 
> 
> 
> It's quite difficult to know what the problem is you want to fix here.
> Your initial post indicated it was the query with the OR condition
> that was causing you the problems, but the plan you've posted has no
> OR condition?!
> 
> You're more likely to get help here if you take time to properly
> explain the situation and post the information that's actually
> relevant to the problem you're having, or state the problem more
> clearly, as there's a mismatch somewhere.
> 
> It might also be worth having a look at
> https://wiki.postgresql.org/wiki/Slow_Query_Questions
> <https://wiki.postgresql.org/wiki/Slow_Query_Questions> . EXPLAIN is not
> going to tell us what part of the query is slow. I'll let the wiki
> page guide you into what to do instead.
> 
>  
> I know that page. obviously, as I have to kill the request, I cannot
> provide a explain analyze... 
> 

It's a bit weird the "victor" table is joined seemingly without any join
conditions, leading to a cross join (which massively inflates the cost
for joins above it). Maybe the anonymized plan mangles it somehow.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: move databases from a MySQL server to Postgresql.

2023-06-23 Thread Tomas Vondra

On 6/23/23 13:45, Alfredo Alcala wrote:
> Hello
> 
> I need to move some databases from a MySQL server to Postgresql.
> 
> Can someone tell me the migration procedure, tools and recommendations?
> 

I'm not an expect on this, but migrations tend to be somewhat
application-specific. I'd suggest you take a look at this wiki page:

https://wiki.postgresql.org/wiki/Converting_from_other_Databases_to_PostgreSQL#MySQL

and maybe try some of the tools mentioned there (pgloader, mysql2pgsql,
and so on).

You'll have to give it a try on your databases, and then ask questions
about practical issues you run ran into.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [Beginner Question]What is query jumbling?

On 7/17/23 05:23, Wen Yi wrote:
> Hi community,
> I am reading the 'Generate code for query jumbling through
> gen_node_support.pl'
> (https://www.postgresql.org/message-id/E1pMk51-000puf-55%40gemulon.postgresql.org)
>  to understand the principle of the postgres's node, but I really don't know 
> the meaning of the word 'query jumbling'.
> 
> Can someone provide me some advice?
> Thanks in advance!
> 

Stuff like this is usually explained somewhere in the code base, so "git
grep" is your friend. In this case it's probably good to read the
comment in src/backend/nodes/queryjumblefuncs.c which says:

 * Normalization is implemented by fingerprinting queries, selectively
 * serializing those fields of each query tree's nodes that are judged
 * to be essential to the query. This is referred to as a query jumble.

So "jumbling" is a process to generate a "normalized" query.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query take a long time and use no index

arallel Seq Scan on data  (cost=0.00..188457.44
> rows=7452844 width=8) (actual time=0.137..2121.695 rows=5962263 loops=3)
>  Planning Time: 0.247 ms
>  JIT:
>    Functions: 11
>    Options: Inlining false, Optimization false, Expressions true,
> Deforming true
>    Timing: Generation 2.665 ms, Inlining 0.000 ms, Optimization 1.612
> ms, Emission 48.580 ms, Total 52.857 ms
>  Execution Time: 3525.656 ms
> (12 rows)
> 
> Why are the indexes not used ?

It's a common misconception that indexes necessarily improve performance
and that slow queries are slow because of missing indexes. In practice
indexes are most useful for selective queries, i.e. queries that scan
only a fraction of the table - say, a couple percent.

But this query scans ~17M rows from the table, and my guess is that's a
significant part of the table. In which case the index is useless.

Also, notice the seqscan took ~4s. Even if that drops to ~1s with an
index, the overall query will still take 12s.

> Is it a Problem of the DATE_TRUNC ?

My guess is it's more about the DISTINCT, which forces a sort before the
aggregate function. Which also means the query can't be parallel. You
can try simplifying the query step by step - remove the DISTINCT first,
then the DATE_TRUNC.

> How can I optimize?
> 

It's always going to be slow with the COUNT(DISTINCT), I'm afraid.

Not sure how much you can modify the query / database, and how accurate
results you need. If you're OK with estimates, you can try postgres-hll
extension [2] which estimates count(distinct). For exact reaults, I
wrote count_distinct extension [2] that uses hashtable. Might be worth a
try, I guess.

Another option is to precalculate stuff, and do queries on that. That
is, you're not going to add data with old timestamps, so you can
pre-aggregate that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Query take a long time and use no index

On 7/17/23 13:20, Tomas Vondra wrote:
> ...
> 
> It's always going to be slow with the COUNT(DISTINCT), I'm afraid.
> 
> Not sure how much you can modify the query / database, and how accurate
> results you need. If you're OK with estimates, you can try postgres-hll
> extension [2] which estimates count(distinct). For exact reaults, I
> wrote count_distinct extension [2] that uses hashtable. Might be worth a
> try, I guess.
> 

Damn, I forgot to add the links:

[1] https://github.com/citusdata/postgresql-hll

[2] https://github.com/tvondra/count_distinct

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: How to grant read only functions execute permission to read only user