Re: Problem upgrading from 10 to 14 with pg_upgrade: unexpected error upgrading "template1" database for some clusters

2022-07-13 Thread Ken Yeung
From: Laurenz Albe 
Sent: Wednesday, July 13, 2022 3:16 AM

> You configured "pg_hba.conf" so that a local login without password is
> not possible.  The easiest solution is to change the "pg_hba.conf"
> entries for "localhost" in IPv4 and IPv6 to "trust" for the duration of
> the upgrade.

I am afraid this workaround may not be suitable for our case because of certain 
security consideration.

Best regards,
Ken



Issues with upserts

2022-07-13 Thread André Hänsel
The wiki entry https://wiki.postgresql.org/wiki/UPSERT kind of suggests that
with the introduction of ON CONFLICT DO UPDATE the problem of upserts is
solved. But is it?

 

A common use case for upserts is to keep a table up to date from an external
data source. So you might have a cron job that runs a relatively large batch
of upserts every couple of minutes.

 

I have found that this use case is not adequately covered by ON CONFLICT DO
UPDATE for two reasons:

- New versions are created for all rows, even if the data is identical. This
quickly fills up the WAL and puts unnecessary load on the tablespace drives.

- If the conflict target is not the serial column, the sequence backing the
serial column gets incremented for every row. This quickly reaches the point
where the serial for new rows exceeds the range of an integer.

 

Does this mean I have to SELECT the data first and do the conflict check in
the application or is there a better SQL-only way?



Re: Issues with upserts

2022-07-13 Thread Jeremy Smith
- New versions are created for all rows, even if the data is identical.
> This quickly fills up the WAL and puts unnecessary load on the tablespace
> drives.
>
> - If the conflict target is not the serial column, the sequence backing
> the serial column gets incremented for every row. This quickly reaches the
> point where the serial for new rows exceeds the range of an integer.
>
>
>

It sounds like you aren't adding a WHERE clause to prevent the duplicate
rows from being updated.  It would help if you could share your query, but
in general this could look like this:

INSERT INTO my_table (col1, col2)
SELECT col1, col2 FROM other_table
ON CONFLICT (col1, col2) DO UPDATE SET (col1, col2) = (EXCLUDED.col1,
EXCLUDED.col2)
WHERE (my_table.col1, my_table.col2) IS DISTINCT FROM (EXCLUDED.col1,
EXCLUDED.col2);


Re: Issues with upserts

2022-07-13 Thread David G. Johnston
On Wednesday, July 13, 2022, André Hänsel  wrote:

> The wiki entry https://wiki.postgresql.org/wiki/UPSERT kind of suggests
> that with the introduction of ON CONFLICT DO UPDATE the problem of upserts
> is solved. But is it?
>
>
>
> A common use case for upserts is to keep a table up to date from an
> external data source. So you might have a cron job that runs a relatively
> large batch of upserts every couple of minutes.
>
>
>
> I have found that this use case is not adequately covered by ON CONFLICT
> DO UPDATE for two reasons:
>
> - New versions are created for all rows, even if the data is identical.
> This quickly fills up the WAL and puts unnecessary load on the tablespace
> drives.
>
> - If the conflict target is not the serial column, the sequence backing
> the serial column gets incremented for every row. This quickly reaches the
> point where the serial for new rows exceeds the range of an integer.
>
>
>
> Does this mean I have to SELECT the data first and do the conflict check
> in the application or is there a better SQL-only way?
>
>
Well, first of all, don’t use serial/auto-increment in this situation.

Second, you need to figure out what the unique key in the provided data is
and key off of that for detecting duplicates.  Ideally it already contains
some bigint key column which then makes the first point easy to accept.

David J.


RE: Issues with upserts

2022-07-13 Thread André Hänsel
Jeremy Smith wrote: 

 

It sounds like you aren't adding a WHERE clause to prevent the duplicate rows 
from being updated.  It would help if you could share your query, but in 
general this could look like this:

 

INSERT INTO my_table (col1, col2)

SELECT col1, col2 FROM other_table

ON CONFLICT (col1, col2) DO UPDATE SET (col1, col2) = (EXCLUDED.col1, 
EXCLUDED.col2)

WHERE (my_table.col1, my_table.col2) IS DISTINCT FROM (EXCLUDED.col1, 
EXCLUDED.col2) 

 

Here’s an example:

 

https://dbfiddle.uk/?rdbms=postgres_14 

 &fiddle=b48d062d2eedbab14157359694b16081

 

CREATE TABLE t (

id serial PRIMARY KEY,

name text NOT NULL UNIQUE,

address text NOT NULL

);

 

INSERT INTO t(name, address)

VALUES ('foo', 'Baker street'),('bar', 'Miller street')

ON CONFLICT (name) DO UPDATE SET address = EXCLUDED.address

WHERE (t.name, t.address) IS DISTINCT FROM (EXCLUDED.name, EXCLUDED.address);

 

INSERT INTO t(name, address)

VALUES ('foo', 'Baker street'),('bar', 'Miller street')

ON CONFLICT (name) DO UPDATE SET address = EXCLUDED.address

WHERE (t.name, t.address) IS DISTINCT FROM (EXCLUDED.name, EXCLUDED.address);

 

INSERT INTO t(name, address)

VALUES ('foo', 'Baker street'),('bar', 'Miller street')

ON CONFLICT (name) DO UPDATE SET address = EXCLUDED.address

WHERE (t.name, t.address) IS DISTINCT FROM (EXCLUDED.name, EXCLUDED.address);

 

INSERT INTO t(name, address)

VALUES ('foo', 'Baker street'),('bar', 'Miller street')

ON CONFLICT (name) DO UPDATE SET address = EXCLUDED.address

WHERE (t.name, t.address) IS DISTINCT FROM (EXCLUDED.name, EXCLUDED.address);

 

SELECT last_value FROM t_id_seq;

 

This will yield “8”, showing that new sequence numbers have been generated for 
each attempt.



Re: Issues with upserts

2022-07-13 Thread Tom Lane
=?utf-8?Q?Andr=C3=A9_H=C3=A4nsel?=  writes:
> This will yield “8”, showing that new sequence numbers have been generated 
> for each attempt.

Well, yeah, because the INSERT has to be attempted first, and
that includes forming the whole candidate row including the
nextval() result.  If you're expecting a serial ID column to not
have holes in the sequence of values, you're going to be sadly
disappointed, whether you use ON CONFLICT UPDATE or not.

regards, tom lane




Re: Issues with upserts

2022-07-13 Thread David G. Johnston
On Wed, Jul 13, 2022 at 7:58 AM André Hänsel  wrote:

>
> SELECT last_value FROM t_id_seq;
>
>
>
> This will yield “8”, showing that new sequence numbers have been generated
> for each attempt.
>

Yep, an entire able-to-be-inserted tuple is formed every time.  That
requires evaluating defaults so that every column has a value.

David J.


Re: Issues with upserts

2022-07-13 Thread Adrian Klaver

On 7/13/22 07:58, André Hänsel wrote:

Jeremy Smith wrote:



CREATE TABLE t (

     id serial PRIMARY KEY,

     name text NOT NULL UNIQUE,

     address text NOT NULL

);




This will yield “8”, showing that new sequence numbers have been 
generated for each attempt.




If running out of id's is a concern use bigserial instead of serial as 
it uses bigint:


bigint  -9223372036854775808 to +9223372036854775807

vs

integer for serial:

integer -2147483648 to +2147483647

--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread Bryn Llewellyn
> b...@yugabyte.com wrote:
> 
>> david.g.johns...@gmail.com  wrote:
>> 
>> As for "schema identifiers" vs. "schema names" - they both seem equally 
>> wrong. The list can very much contain sequences of characters that when 
>> interpreted as an identifier and looked for in the pg_namespace catalog do 
>> not find a matching entry and are therefore by definition not the name of 
>> any known schema in that database.
>> 
>> Besides, I hazard to guess how many times we write "table name" and "column 
>> name" in the documentation when your argument is that "table identifier" and 
>> "column identifier" is the correct choice.  No, rather "name" and 
>> "identifier" in the context of database objects are known to mean the same 
>> thing - the alphabetic name of the object.
> 
> Well, "putative" or "candidate" can be used to resolve your existence 
> criterion. But why bother? In my book, Bertie Wooster (or Bertram Wilberforce 
> Wooster if you prefer) is a perfectly fine candidate name in the general 
> English speaking culture. It's neither here nor there if there happens to be 
> any living person who has the name...
> 
> But never mind. If you'd like a diverting read on this topic, go here:
> 
> https://blogs.oracle.com/sql/post/a-collection-of-plsql-essays 
> 
> 
> look for this, and download the PDF:
> 
> «
> Names vs identifiers
> 
> Databases are full of things: tables, sequences, columns, views, PL/SQL 
> units, what have you. Things have names and are manipulated by mentioning the 
> names. The programming languages SQL and PL/SQL use identifiers, not names. 
> Questions show many programmers are confused about the difference. This note 
> describes the relationships between things, names, and identifiers. Once the 
> programming rules are absorbed, developers can write code faster and with 
> less heartburn.
> »
> 
> It's written by a former colleague with whom I spent many happy hours 
> discussing the topic.
> 
> Over and out?

Something has been nagging me since I sent this. I was sure that I’d seen this:

"Bllewell"

in some catalog view where I expected the name of a role. I just found it in 
pg_proc. The spelling Bllewell with no quotes is the name of an o/s user on my 
MacBook. I don't know why it ends up in my MacBook's PG installation—but that's 
a story for a different day. My PG cluster has a database called demo. I just 
did this test:

\c demo postgres

-- What a pain to have to do this to get silent "if exists" behavior.
do $body$
begin
  begin
drop owned by "Exotic Me" cascade;
  exception when undefined_object then
null;
  end;
  begin
drop owned by """Exotic Me""" cascade;
  exception when undefined_object then
null;
  end;
end;
$body$;

create role "Exotic Me" login password 'p';
grant connect on database demo to "Exotic Me";

create role """Exotic Me""" login password 'p';

\c demo "Exotic Me"
create schema s;

create procedure s.p()
  language plpgsql
as $body$
begin
  null;
end;
$body$;

with c as (
  select
rolname   as role_name
  from pg_roles
  union all
  select
distinct proowner::regrole::text  as role_name
  from pg_proc)
select role_name
from c
where
  lower(role_name) like '%bllewell%' or
  lower(role_name) like '%exotic%';

This is the result:

  role_name  
-
 Bllewell
 Exotic Me
 "Exotic Me"
 "Bllewell"

Of course I understand why I see both Exotic Me with no quotes and "Exotic Me" 
with double quotes: I asked for it. But why do I see both Bllewell with no 
quotes and "Bllewell" with double quotes. Are there really two distinct roles 
with those two names? Or did pg_roles end up with the identifier for the exotic 
name Bllewell rather than the bare name itself?

And on the disputed notion that the identifier for a name is a distinct 
phenomenon from the name itself, I noted this here:
https://www.postgresql.org/docs/current/datatype-oid.html

«
The input functions for these types allow whitespace between tokens, and will 
fold upper-case letters to lower case, except within double quotes; this is 
done to make the syntax rules similar to the way object names are written in 
SQL. Conversely, the output functions will use double quotes if needed to make 
the output be a valid SQL identifier.
»

It comes close (but no cigar) to making the distinction. It uses both "object 
name" and "SQL identifier" and says (more or less) that a "SQL identifier" is 
the way an object name is written in SQL by double-quoting it. I'm using 
"exotic" as a tautological shorthand for what you *must* surround with 
double-quotes in SQL and PL/pgSQL to express what you want.



Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread David G. Johnston
On Wed, Jul 13, 2022 at 4:33 PM Bryn Llewellyn  wrote:

>
>
>
>
>
> *  role_name  - Bllewell Exotic Me "Exotic Me" "Bllewell"*
>
> Of course I understand why I see both Exotic Me with no quotes and "Exotic
> Me" with double quotes: I asked for it. But why do I see both Bllewell with
> no quotes and "Bllewell" with double quotes. Are there really two distinct
> roles with those two names? Or did pg_roles end up with the identifier for
> the exotic name Bllewell rather than the bare name itself?
>
>
You seem perfectly capable of answering those last two questions yourself,
better than we who do not have the access to your database; access needed
to query pg_roles.

Given that you used UNION ALL I'm not clear why Exotic Me isn't duplicated
in the output you profess comes from the test case you wrote.  Is this
another one of your mistakes in presenting a self-contained test case?

David J.


Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread Bryn Llewellyn
> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>>   role_name  
>> -
>>  Bllewell
>>  ...
>>  "Bllewell"
>> 
>> ...Are there really two distinct roles with those two names?...
> 
> Is this another one of your mistakes in presenting a self-contained test case?

I was far, far too elliptical in what I wrote. Sorry, David. Yes, that was a 
mistake. These are the two fundamental observations that got me started on this:

Observation #1 (the native content of "pg_roles"):

select rolname as role_name from pg_roles order by 1;

 role_name 
---
 Bllewell
 pg_database_owner
 ...
 postgres

Observation #2 (the native content of "pg_proc"):

select
  proowner::regrole as role_name,
  proname
from pg_proc
order by 1, 2
limit 3;

 role_name  |proname 
+
 "Bllewell" | RI_FKey_cascade_del
 "Bllewell" | RI_FKey_cascade_upd
 "Bllewell" | RI_FKey_check_ins

The clue to the different spellings is the ::regrole typecast. I knew all along 
that I could get the name of the owner of a function from: 

pg_proc p inner join pg_roles r on p.proowner = r.oid

But I wondered if I could find an operator (using the word loosely) to save me 
some typing. Google took me to the "Postgres get function owner" post on Stack 
Exchange:
https://dba.stackexchange.com/questions/259001/postgres-get-function-owner

Laurenz Albe answered the question thus:

«
The complicated way is to join with pg_roles and get the user name from there. 
But since PostgreSQL hackers don't want to type more than necessary, they came 
up with something else... Internally, every object is... identified.. by its 
“object ID”. This is the number that is for example used in the proowner column 
of pg_proc... but the type output function, which is used for display of the 
type, renders it as the object's name. Put that together with PostgreSQL's type 
cast operator ::, and you can do... ::regrole.
»

This was exactly what I wanted! I took Lawrence's « ::regrole renders proowner 
as the [owner]'s name » at face value. But when the name happens to be exotic, 
as is Joe, you see that ::regrole in fact renders proowner as the *identifier* 
for the function's owner's name. The testcase at the end makes the point more 
explicitly. It does some set-up that lets me do this:

select
  r.rolnameas "The owner name",
  p.proowner::regrole  as "The *identifier* for the owner name"
from
  pg_roles r
  inner join
  pg_proc p
  on r.oid = p.proowner
where p.proname = 'something_strange';

 The owner name | The *identifier* for the owner name 
+-
 Joe| "Joe"

This is what I've been banging on about all the time. It seems that I'm the 
only person in the pgsql-general community who wants some proper terms of art 
(as I used them in my column aliases) that let me say simply and clearly, why 
pg_proc.proowner::regrole gives a differently spelled answer than does 
pg_roles.role_name.

B.t.w., I looked at "8.19. Object Identifier Types". It uses the term 
"identifier" in the more ordinary sense of "unique identifier" (as are the 
values in a table's PK column that might be numeric, text, uuid, or whatever. 
It mentions quote_ident(). I tried it thus:

select quote_ident('Joe') as joe, quote_ident('"Dog"') as dog;

  joe  |dog
---+---
 "Joe" | """Dog"""

(so that's right). And it goes on to say "...names that require quoting". So 
the difference is in the air and is hinted at with "ident" and "name". But the 
wording doesn't tie things down.

I'll shut up on this now.

--
-- Testcase setup

\c postgres postgres
create role "Joe" login password 'p';

create database tmp;
grant connect on database tmp to "Joe";
grant create on database tmp to "Joe";

\c tmp "Joe"
create schema s;
create procedure s.something_strange()
  language plpgsql
as $body$
begin
  null;
end;



Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread David G. Johnston
On Wed, Jul 13, 2022 at 10:02 PM Bryn Llewellyn  wrote:

>
>
>
> * The owner name | The *identifier* for the owner
> name +- Joe
>   | "Joe"*
>
> This is what I've been banging on about all the time. It seems that I'm
> the only person in the pgsql-general community who wants some proper terms
> of art
>

You might get a more trustworthy representation if this wasn't buried in a
thread about NULL and search_path.

But even with that I'd probably say while there are people who might find
the pureness of such a nomenclature to be appealing the project at this
point has no need to seek such perfection in this area.


> (as I used them in my column aliases) that let me say simply and clearly,
> why pg_proc.proowner::regrole gives a differently spelled answer than does
> pg_roles.role_name.
>

So you want "identifier" to be used whenever the printed or written value
allows the usage of structural double quotes and "name" when it doesn't?

IOW, you seem to have a long-winded way of saying, for the following
documentation paragraph:

"The input functions for these types allow whitespace between tokens, and
will fold upper-case letters to lower case, except within double quotes;
this is done to make the syntax rules similar to the way object names are
written in SQL. Conversely, the output functions will use double quotes if
needed to make the output be a valid SQL identifier."

Change "object names" to "object identifiers".

Feel free to tug on that string if you'd like, I'm not convinced enough
that it is even incorrect and suspect that style of writing quite prevalent
in the docs.  You'd need a larger sample size to argue with and propose
fixes for before a patch would likely be considered.

David J.