from:"Bryn Llewellyn"

Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-06 Thread Bryn Llewellyn

I read this blog postPostgreSQL 11 – Server-side Procedures — Part 1 and Part 2It starts with “Thanks to the work done by 2ndQuadrant contributors, we now have the ability to write Stored Procedures in PostgreSQL… once PostgreSQL 11 comes out”. It focuses on doing txn control from a stored proc.In my initial test of the code that it presents, I got the runtime error “invalid transaction termination” from the first-encountered txn control statement (that happened to be commit). I presently realized that this was because I had \set AUTOCOMMIT OFF in my .psqlrc startup file. Once I changed this, the code worked as the blog described.I’m hoping that someone from 2ndQuadrant can answer my questions about what my tests show. They use a trivial table created thus:create table t(n integer);My first test uses this:create or replace procedure p1() language plpgsqlas $$begin insert into t(n) values(17);end;$$;I test it first with AUTOCOMMIT OFF and then with in ON. The results are exactly as I’d expect.When it’s ON, the effect of the insert shows up with a select immediately after the call finishes. Then rollback really does wipe out the effect of the insert, as is shown with a subsequent select.And with AUTOCOMMIT ON, the effect of the insert again shows up with a select immediately after the call finishes. But now a subsequent rollback causes “WARNING: there is no transaction in progress”. The effect of the insert was already committed.I could add a sleep after the insert and then watch from a second session. For now, I’m assuming that the effect of AUTOCOMMIT ON takes place when the call finishes and not immediately after the insert.My second test uses this:create or replace procedure p2() language plpgsqlas $$ declare levl_1 varchar(20); levl_2 varchar(20); begin -- This "rollback" is critical. -- Else "SET TRANSACTION ISOLATION LEVEL must be called before any query". rollback; set transaction isolation level repeatable read; show transaction isolation level into levl_1; insert into t(n) values(17); rollback; set transaction isolation level serializable; show transaction isolation level into levl_2; insert into t(n) values(42); commit; raise notice 'isolation level #1: %', levl_1; raise notice 'isolation level #2: %', levl_2; end;$$;(I got into this because I want to encapsulate all the logic that changes a table which has, in my functional spec, the table-level data rule: exactly one or two rows where column c1 has value ‘x’. The idea is to use the “serializable” isolation level and finish with a query that tests the rule. I’ll do this in an infinite loop with a sleep to that if a concurrent execution of the same proc pre-empts me and I get the “could not serialize…” error, I’ll hande the exception and go round the loop again, exiting only when I don’t get the exception.)As mentioned above, I must call this with AUTOCOMMIT ON to avoid a runtime error. See my comment: the rollback as the proc’s very first executable statement is critical. The code runs without error and shows the result that I expect.I’m attaching txn_control_in_plpgsql_proc.sql. It’s self-contained (can be run time and again with the same outcome) and implements what I just described. I’m also attaching txn_control_in_plpgsql_proc.txt. This is the stdout output, copied from the terminal window, produced when I invoke psql to start my .sql script from the command line.B.t.w.. I have a real implementation of what I wanted to achieve and it seems to work perfectly. So I’ve achieved my goal. But I hate the fact that I got there by trial and error and cannot rehearse a mental model that informs me why what I wrote is the proper approach.Please describe the rules for all this and reassure me that I can rely on the behavior that I implemented by starting with rollback in my proc.

txn_control_in_plpgsql_proc.sql
Description: Binary data
$ psql -h 127.0.0.1 -p 5432 -d postgres -U postgres -f
txn_control_in_plpgsql_proc.sql

Setup.

Test p1() with AUTOCOMMIT off

n
---

Test p1() with AUTOCOMMIT on

Expect "WARNING: there is no transaction in progress"
psql:txn_control_in_plpgsql_proc.sql:79: WARNING: there is no transaction in
progress

Test p2() with AUTOCOMMIT on. Works only in this mode.
Else "ERROR: invalid transaction termination"
from the first "rollback" statement.

psql:txn_control_in_plpgsql_proc.sql:95: NOTICE: isolation level #1:
repeatable read
psql:txn_control_in_plpgsql_proc.sql:95: NOTICE: isolation level #2:
serializable
n

$
.

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-06 Thread Bryn Llewellyn

Thanks Adrian. My aim with p1() was to show that its behavior, in each 
AUTOCOMMIT mode, aligns with my intuition. I’ve noticed the system-generated 
“begin” that you mention when I add this to my 
“/usr/local/var/postgres/postgresql.conf”:

log_statement = 'all'
log_directory = 'log'
logging_collector = on

and tail the most recent “/usr/local/var/postgres/log/postgresql-*.log” file. I 
assume that the “begin” is generated server-side—and not by the psql 
client-side program, other other such clients.

However, the intuition that informs my understanding of the behavior of p1() 
lets me down for p2(). My staring assumption was that if I want to do txn 
control in a plpgsql proc, then I must grant it that ability by stopping doing 
txn control at the outer level.  But experiments—and what I’ve been told—tell 
me that I must do the opposite.

Here’s how I’ve tried to describe what I see for p2() with AUTOCOMMIT ON for 
myself:

1. my call p2() starts a txn.

2. However, during the execution of the proc, the usual autocommit behavior is 
programmatically turned off by explicit PostgreSQL code.

3. Other explicit PostgreSQL code makes “start transaction” inside a proc 
simply cause a runtime error under all circumstances. However, txns can be 
ended by “commit” or “rollback”. And new ones can be started—but only 
implicitly by executing a SQL statement that, as a top level SQL, would start a 
txn.

4. This is why “set transaction isolation level repeatable read” in my p2() is 
legal immediately after “rollback”—and produces the semantics I’d expect. At 
top level, and with autocommit turned on, it implicitly starts a txn—and you 
see the “begin” in the log file.

5. When the proc call ends, normal AUTOCOMMIT mode is turned on again, and a 
“commit” is issued automatically. This may, or may not, have something to do—as 
you can see by running p3() with AUTOCOMMIT ON.

create or replace procedure p3()
  language plpgsql
as $$
begin
  insert into t(n) values(17);
  rollback;
  insert into t(n) values(42);
end;
$$;

After calling it, you see just one row with the value 42—and it’s already 
durable.

This is why I want the folks who invented this behavior to describe the correct 
plplsql programmer’s mental model for me with the terminology that they 
designed.

adrian.kla...@aklaver.com wrote:

HERE , on “AUTOCOMMIT”.

“When on (the default), each SQL command is automatically committed upon 
successful completion. To postpone commit in this mode, you must enter a BEGIN 
or START TRANSACTION SQL command. When off or unset, SQL commands are not 
committed until you explicitly issue COMMIT or END. The autocommit-off mode 
works by issuing an implicit BEGIN for you, just before any command that is not 
already in a transaction block and is not itself a BEGIN or other 
transaction-control command, nor a command that cannot be executed inside a 
transaction block (such as VACUUM).”

The way I understand it in your first case:
\set AUTOCOMMIT off
-- get clean start
begin;
delete from t;
commit;

call p1();
select n from t order by n;
rollback;
select n from t order by n;
rollback;

You have a implied BEGIN; before the begin; you wrap the delete in. Therefore 
you can do the rollback;. In the AUTOCOMMIT on there is only one transaction 
and it ends with the commit; after the delete. So when you attempt the rollback 
you get the error. REMEMBER in plpgsql Begin is not for transaction control, 
HERE  (43.8. 
Transaction Management).

Have not worked through the second case yet.
-- 
Adrian Klaver
adrian.kla...@aklaver.com

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Bryn Llewellyn

I have a version of the code that I attached with my first email in this thread
where I’ve added “raise notice” invocations to show the txid as my p1() and
p2() execute. Everything that I see is consistent with what I wrote separately
in reply to adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>:

<<
1. my call p2() starts a txn.

2. However, during the execution of the proc, the usual autocommit behavior is
programmatically turned off by explicit PostgreSQL code.

3. Other explicit PostgreSQL code makes “start transaction” inside a proc
simply cause a runtime error under all circumstances. However, txns can be
ended by “commit” or “rollback”. And new ones can be started—but only
implicitly by executing a SQL statement that, as a top level SQL, would start a
txn.

4. This is why “set transaction isolation level repeatable read” in my p2() is
legal immediately after “rollback”—and produces the semantics I’d expect. At
top level, and with autocommit turned on, it implicitly starts a txn—and you
see the “begin” in the log file.

5. When the proc call ends, normal AUTOCOMMIT mode is turned on again, and a
“commit” is issued automatically. This may, or may not, have something to do—as
you can see by running p3() with AUTOCOMMIT ON.
>>

I’m interested only in the case that my proc does txn control—i.e. my p2()—and
so I’m not interested in how my p1() behaves.

About your “I believe there is no reason ever to begin a procedure with a
rollback”, I already explained why I did that. My higher goal is to take
advantage of the “serializable” isolation level to safely enforce a multi-row
data rule. And I want my pspgsql proc—following the time-honored philosophy for
stored procs—to own the complete implementation. I discovered that doing
“rollback” as the first executable statement in my proc allowed me to do “set
transaction isolation level serializable”. And I’ve found no other way to do
this. As I mentioned, the “real” version of my proc, written this way does pass
my functionality tests.

B.t.w., I noticed that “set transaction isolation level serializable” must be
the very first statement after “rollback” (or “commit”). Even an invocation of
txid_current() after the rollback and before the ““set transaction” causes this
runtime error: “SET TRANSACTION ISOLATION LEVEL must be called before any
query”.

About your PS, I’m new to this list—so forgive me if I didn’t follow proper
etiquette. But as adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
pointed out, the functionality under discussion here is part of the core
PostgreSQL implementation.

On 07-Aug-2019, at 00:15, Luca Ferrari wrote:

On Wed, Aug 7, 2019 at 12:19 AM Bryn Llewellyn wrote:
> 1. my call p2() starts a txn.

In my opinion this is the point: a procedure must be "owner" of the
transaction to issue transaction control statements. You can watch
different behaviors placing here and there txid_current() before and
within p2 and re-run with autocommit off an on.
On autocmmmit = on the call opens a transaction and the procedure is
the only thing within the transaction, so it has full control. With
autocommit off you gain an implicit begin and the procedure is
possibly not the only thing you have within the transaction.
You can test it with:

testdb=# begin;
BEGIN
testdb=# call p2();
INFO: TXID 134994
ERROR: invalid transaction termination
CONTEXT: PL/pgSQL function p2() line 9 at ROLLBACK
testdb=# rollback;
ROLLBACK

Having said that, I believe there is no reason ever to begin a
procedure with a rollback.

Here's a small example that performs a few commits and rollback
<https://github.com/fluca1978/PostgreSQL-11-Quick-Start-Guide/blob/master/Chapter04/Chapter04_Listing29.sql>

Luca

P.S:
I don't believe that asking on a public mailing list for a company to
answer is the right thing to do, you can always ask themselves on your
own.

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Bryn Llewellyn

Thanks for your response, Kyotaro. I’m happy, now, to accept the rule that 
“call proc_that_does_txn_control()” is legal only when AUTOCOMMIT is ON. Esp. 
when I’m told (on twitter, by 2ndQuadrant’s Peter Eisentraut, that this rule is 
“an implementation restriction, for the most part.” See HERE 
<https://twitter.com/petereisentraut/status/1158802910865756160>.

About your “In-procedure transaction control premises that no transaction is 
active before calling the procedure”… yes. Nevertheless, as the code that Umair 
Sahid showed us in the blog post that I referenced in my email that started 
this thread, you can indeed start end end transactions from an executing proc 
(as long as the session’s AUTOCOMMIT mode s ON).

So, logic tells me that once a txn is ended by issuing “commit” or “rollback”, 
you should be allowed to start the next one explicitly with “start 
transaction”. However, as mentioned, this causes a runtime error. I’ve decided 
simply not to care because I’ve discovered how to write my proc so that it 
passes the functionality tests that it ought to. I have to rely on the fact the 
the statements I’m interested in doing (including setting the isolation level) 
all implicitly start a txn and so “start transaction” isn’t needed!

Thanks to all who responded. The synthesis of what you all wrote helped me 
enormously. Case closed.

On 07-Aug-2019, at 00:26, Kyotaro Horiguchi  wrote:

Hello, Bryn.

At Tue, 6 Aug 2019 15:18:51 -0700, Bryn Llewellyn  wrote in 

> Here’s how I’ve tried to describe what I see for p2() with AUTOCOMMIT ON for 
> myself:
> 
> 1. my call p2() starts a txn.
> 
> 2. However, during the execution of the proc, the usual autocommit behavior 
> is programmatically turned off by explicit PostgreSQL code.
> 
> 3. Other explicit PostgreSQL code makes “start transaction” inside a proc 
> simply cause a runtime error under all circumstances. However, txns can be 
> ended by “commit” or “rollback”. And new ones can be started—but only 
> implicitly by executing a SQL statement that, as a top level SQL, would start 
> a txn.

In-procedure transaction control premises that no transaction is
active before calling the procedure.

https://www.postgresql.org/docs/11/sql-call.html

> If CALL is executed in a transaction block, then the called
> procedure cannot execute transaction control
> statements. Transaction control statements are only allowed if
> CALL is executed in its own transaction.

With AUTOCOMMIT=off, implicit BEGIN is invoked just before
CALLing p2() if no transaction is active. Thus p2() is always
called having a transaction active, which inhibits in-procedure
transaction control.

I'm not sure why you want to turn AUTOCOMNIT off, but even with
AUTOCOMMIT on, explict BEGIN prevents per-command COMMIT as you
perhaps know.

https://www.postgresql.org/docs/11/app-psql.html
> When on (the default), each SQL command is automatically
> committed upon successful completion. To postpone commit in
> this mode, you must enter a BEGIN or START TRANSACTION SQL
> command.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Bryn Llewellyn

Yes, I do believe that I understand this. But there’s no getting away from the 
fact that the AUTOCOMMIT mode, and what this implies, is a server-side 
phenomenon—at least as several PostgreSQL experts have assured me. For example, 
when you use client-side Python with the psycopg2 driver, then once you’ve done 
“my_session = psycopg2.connect(connect_str)”, you can then do 
“my_session.set_session(autocommit=False)”. And then everything we’ve been 
saying in the psql context now applies in that context—yes?

B.t.w., I’m guessing that the “begin” SQL command that you see in the log that 
I mentioned is actually issued by (some) clients—at least psql and 
Python-on-psycopg2—as an explicit call from the client. In other words, it 
isn’t the server that generates this. Does anyone know for sure how this works?

On 07-Aug-2019, at 11:56, Adrian Klaver  wrote:

On 8/7/19 11:46 AM, Bryn Llewellyn wrote:
> Thanks for your response, Kyotaro. I’m happy, now, to accept the rule that 
> “call proc_that_does_txn_control()” is legal only when AUTOCOMMIT is ON. Esp. 
> when I’m told (on twitter, by 2ndQuadrant’s Peter Eisentraut, that this rule 
> is “an implementation restriction, for the most part.” See HERE 
> <https://twitter.com/petereisentraut/status/1158802910865756160>.
> About your “In-procedure transaction control premises that no transaction is 
> active before calling the procedure”… yes. Nevertheless, as the code that 
> Umair Sahid showed us in the blog post that I referenced in my email that 
> started this thread, you can indeed start end end transactions from an 
> executing proc (as long as the session’s AUTOCOMMIT mode s ON).

The key is that the AUTOCOMMIT status is just a specific case of the general 
rule. The general rule being that a PROCEDURE cannot do transaction ending 
commands when it it called within an outer transaction. You can run into the 
same issue in other situations e.g. ORM's that start a transaction behind the 
scenes. In other words this is not psql specific.  As long as you understand 
the general rule then things become clearer.

-- 
Adrian Klaver
adrian.kla...@aklaver.com

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Bryn Llewellyn

Thanks, Adrian.

On 07-Aug-2019, at 13:19, Adrian Klaver  wrote:

On 8/7/19 12:54 PM, Bryn Llewellyn wrote:
> Yes, I do believe that I understand this. But there’s no getting away from 
> the fact that the AUTOCOMMIT mode, and what this implies, is a server-side 
> phenomenon—at least as several PostgreSQL experts have assured me. For 
> example, when you use client-side Python with the psycopg2 driver, then once 
> you’ve done “my_session = psycopg2.connect(connect_str)”, you can then do 
> “my_session.set_session(autocommit=False)”. And then everything we’ve been 
> saying in the psql context now applies in that context—yes?

The server responds to instructions from the client.

General rule:
https://www.postgresql.org/docs/11/sql-call.html

"If CALL is executed in a transaction block, then the called procedure cannot 
execute transaction control statements. Transaction control statements are only 
allowed if CALL is executed in its own transaction.
"

> B.t.w., I’m guessing that the “begin” SQL command that you see in the log 
> that I mentioned is actually issued by (some) clients—at least psql and 
> Python-on-psycopg2—as an explicit call from the client. In other words, it 
> isn’t the server that generates this. Does anyone know for sure how this 
> works?

psql:
https://doxygen.postgresql.org/bin_2psql_2common_8c.html
Starting ~ line 1355

if (transaction_status == PQTRANS_IDLE &&
!pset.autocommit &&
!command_no_begin(query))
{
results = PQexec(pset.db, "BEGIN");
 if (PQresultStatus(results) != PGRES_COMMAND_OK)

...

psycopg2:

https://github.com/psycopg/psycopg2/blob/master/psycopg/connection_int.c
~line 1294

> On 07-Aug-2019, at 11:56, Adrian Klaver  <mailto:adrian.kla...@aklaver.com>> wrote:
> On 8/7/19 11:46 AM, Bryn Llewellyn wrote:
>> Thanks for your response, Kyotaro. I’m happy, now, to accept the rule that 
>> “call proc_that_does_txn_control()” is legal only when AUTOCOMMIT is ON. 
>> Esp. when I’m told (on twitter, by 2ndQuadrant’s Peter Eisentraut, that this 
>> rule is “an implementation restriction, for the most part.” See HERE 
>> <https://twitter.com/petereisentraut/status/1158802910865756160>.
>> About your “In-procedure transaction control premises that no transaction is 
>> active before calling the procedure”… yes. Nevertheless, as the code that 
>> Umair Sahid showed us in the blog post that I referenced in my email that 
>> started this thread, you can indeed start end end transactions from an 
>> executing proc (as long as the session’s AUTOCOMMIT mode s ON).
> The key is that the AUTOCOMMIT status is just a specific case of the general 
> rule. The general rule being that a PROCEDURE cannot do transaction ending 
> commands when it it called within an outer transaction. You can run into the 
> same issue in other situations e.g. ORM's that start a transaction behind the 
> scenes. In other words this is not psql specific.  As long as you understand 
> the general rule then things become clearer.
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>


-- 
Adrian Klaver
adrian.kla...@aklaver.com

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-07 Thread Bryn Llewellyn

Thanks David. I’m relieved to hear that I’m not the only one who finds it hard 
to predict the behavior of some things in PostgreSQL just from reading the 
docs. Hypothesising and experimenting seem to be needed for the trickier cases.

You use the phrase "nested transaction”. This Google search gets no hits:

  "nested transaction" site:www.postgresql.org/docs/11/ 
<http://www.postgresql.org/docs/11/>

And as I’ve come to understand the phrase in the wider world, "nested 
transaction” is synonymous with “autonomous transaction”. PostgreSQL 11.2 
doesn’t support these. But I’ve heard that some future release might. So, as I 
see it, a single session can do only one transaction after another with no 
overlapping. My experiments using txid_current()—mentioned elsewhere in the 
offshoots from my original post—are consistent with this hypothesis.

On 07-Aug-2019, at 13:40, David G. Johnston mailto:david.g.johns...@gmail.com>> wrote:

On Wed, Aug 7, 2019 at 12:18 PM Luca Ferrari mailto:fluca1...@gmail.com>> wrote:
On Wed, Aug 7, 2019 at 8:28 PM Bryn Llewellyn mailto:b...@yugabyte.com>> wrote:

> B.t.w., I noticed that “set transaction isolation level serializable” must be 
> the very first statement after “rollback” (or “commit”). Even an invocation 
> of txid_current() after the rollback and before the ““set transaction” causes 
> this runtime error: “SET TRANSACTION ISOLATION LEVEL must be called before 
> any query”.
>

Well, SET TRANSACTION ISOLATION must be the very first instruction of
every transaction, not only within the case you describe.

IMHO, The documentation, probably in chapter 13, could use some exposition on 
this topic.

What is being described here is basically:

[implicit] BEGIN (using default transaction options)
CALL
>>>>BEGIN (with inherited default transactions options)
>>>>ROLLBACK (to get rid of the nested transaction setup by the call with the 
>>>>inherited default options)
>>>>START WITH OPTIONS
>>>>COMMIT;
<<<

Re: Why must AUTOCOMMIT be ON to do txn control in plpgsql procedure?

2019-08-08 Thread Bryn Llewellyn

that I've

setup_for_test_4.sql
Description: Binary data
 found no other way to meet my goal. I don’t understand, therefore, why some people (but not you, Daniel!) who’ve responded to my questions say that starting my proc with “rollback” is pointless.I tried, earlier, to say “case closed”. I’ll say it again now.On 08-Aug-2019, at 06:53, Daniel Verite <dan...@manitou-mail.org> wrote:	Bryn Llewellyn wrote:B.t.w., I’m guessing that the “begin” SQL command that you see in the logthat I mentioned is actually issued by (some) clients—at least psql andPython-on-psycopg2—as an explicit call from the client. In other words, itisn’t the server that generates this. Does anyone know for sure how thisworks?Well, that's the point of Autocommit, and moreover it does nothingelse. Maybe you're still confused about this.* Autocommit off = the client automatically adds a "BEGIN" whenit thinks a transaction must be started on behalf of the user.* Autocommit on = the client does nothing.The fact that "off" implies doing something and "on" implies notinterfering is counterintuitive, but that's how it is.	Autocommit isfor compatibility with other databases. If it was only for Postgres, Iguesss either it wouldn't exist in the first place or it should becalled "AutoStartTransactionBlock" or something like that, becausethat's just what it really does.Anyway, the server only know whether a BEGIN has been issued.It never knows or cares whether it has been added implicitly or explicitly,which is why it can be quite confusing to reason about server-sidedifferences in terms of Autocommit, as you do in some of your previousmessages.It should be stressed that Autocommit is not a parameter of thesession between Postgres and the SQL client, but rather it's aparameter of the session between the user and their SQL client.So when you're hypothesizing that a plpgsql block in a procedurewould look at this parameter or change it temporarily (yourpoints #2 and #5 in your analysis of p2's execution), you shouldsee that it's impossible, because on the server-side, this parameterjust does not exist.Best regards,-- Daniel VéritéPostgreSQL-powered mailer: http://www.manitou-mail.orgTwitter: @DanielVerite

psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn

Using a MacBook Pro with the current Big Sur—Version 11.2 (20D64).

I just upgraded to PostgreSQL 13.1. (Earlier, I was on 12.x.) Using psql, the 
behavior of ordinary copy-and-paste has change dramatically, and for the worse, 
 w.r.t. Version 12.

HAS ANYBODY ELSE SEEN WHAT I REPORT BELOW?

First observation

Now, when I copy a single line SQL command, terminated with semicolon and 
newline from the Text Edit app (with Command-C or the menu item) and then paste 
it into psql (with Command-V or the menu item), the newline isn't respected. I 
have to hit the return key by hand to see the effect. Moreover, the pasted line 
has a highlighted background.

Second observation

When I copy _several_ lines of SQL commands from the Text Edit app and then 
paste them into psql, none of the newlines are respected. (I still get the 
strange highlight.) Now when I hit the return key, I get errors like this:

  \i: extra argument "" ignored

for every single line.

This makes a standard working practice that my fingers have learned over 
decades suddenly completely unusable.

NOTE: the \i metacommand still works as it always did.

Re: psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn

paul.foers...@gmail.com wrote:

Hi Bryn,

> On 09. Feb, 2021, at 19:55, Bryn Llewellyn  wrote:
> 
> Using a MacBook Pro with the current Big Sur—Version 11.2 (20D64).
> 
> I just upgraded to PostgreSQL 13.1. (Earlier, I was on 12.x.) Using psql, the 
> behavior of ordinary copy-and-paste has change dramatically, and for the 
> worse,  w.r.t. Version 12.
> 
> HAS ANYBODY ELSE SEEN WHAT I REPORT BELOW?
> 
> First observation
> 
> Now, when I copy a single line SQL command, terminated with semicolon and 
> newline from the Text Edit app (with Command-C or the menu item) and then 
> paste it into psql (with Command-V or the menu item), the newline isn't 
> respected. I have to hit the return key by hand to see the effect. Moreover, 
> the pasted line has a highlighted background.
> 
> Second observation
> 
> When I copy _several_ lines of SQL commands from the Text Edit app and then 
> paste them into psql, none of the newlines are respected. (I still get the 
> strange highlight.) Now when I hit the return key, I get errors like this:
> 
> \i: extra argument "" ignored
> 
> for every single line.
> 
> This makes a standard working practice that my fingers have learned over 
> decades suddenly completely unusable.
> 
> NOTE: the \i metacommand still works as it always did.

well, I guess this is a macOS Big Sur issue and not a PostgreSQL problem. Or 
rather, it's a user (your) issue because it works as designed. ;-)

I'm on Big Sur 11.2 (20D64) too. From what I can tell, copy/paste hasn't 
changed. But then, I don't use Text Edit. I use both vi in a Terminal and 
TextMate. I don't like Text Edit because this is just as comfortable and 
capable as is Windows' Notepad...

Anyway, copying a line including the newline will require you to mark the whole 
line. Triple click on a line and you see the marking go beyond the last 
character in the line.

If you click (and hold!) while moving the mouse you will see the marking move 
accordingly. Note how it jumps to show the complete line up to the full right 
including the part beyond the last character if you move the mouse beyond that. 
If you want to include the new line in your copying to the clipboard, you'll 
need to include that in your marking.

Cheers,
Paul

We can cut Text Edit out of the picture entirely. If I enter, say, “select 
version();” in psql it of course works fine when I hit “return”. If I then 
copy-and-paste this from the psql screen (including the newline), then I see 
the same effect that I described above.

I should have said that I also use YugaByteDB in the same env. This has its own 
app like psql called ysqlsh. It's built on psql version 11.2. The behavior that 
I describe does NOT show up with ysqlsh.

I could have said that copy-and-paste from the Text Edit app to the terminal 
window O/S prompt works as it always has. 

The effect is specific to psql.

I can presently try PostgreSQL Version 12 (latest).

Re: psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn

On 09-Feb-2021, at 11:43, Tom Lane  wrote:

Bryn Llewellyn  writes:
> HAS ANYBODY ELSE SEEN WHAT I REPORT BELOW?

> First observation

> Now, when I copy a single line SQL command, terminated with semicolon and 
> newline from the Text Edit app (with Command-C or the menu item) and then 
> paste it into psql (with Command-V or the menu item), the newline isn't 
> respected. I have to hit the return key by hand to see the effect. Moreover, 
> the pasted line has a highlighted background.

> Second observation

> When I copy _several_ lines of SQL commands from the Text Edit app and then 
> paste them into psql, none of the newlines are respected. (I still get the 
> strange highlight.) Now when I hit the return key, I get errors like this:
>  \i: extra argument "" ignored

FWIW, I'm not seeing that here, with Big Sur 11.2 and up-to-date Postgres.

In a typical Postgres build, most of psql's input behavior is not
determined by psql itself, but by libreadline (or possibly libedit,
if PG was configured to use that instead).  I speculate that your
build switched to a newer version of readline or libedit, and it's
behaving differently than you're used to.  You could get some info
about this by applying "otool -L" to the psql executable.  On my
laptop I see

$ otool -L /Users/tgl/testversion/bin/psql
/Users/tgl/testversion/bin/psql:
   /Users/tgl/testversion/lib/libpq.5.dylib (compatibility version 5.0.0, 
current version 5.14.0)
   /usr/lib/libedit.3.dylib (compatibility version 2.0.0, current version 
3.0.0)
   /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 
1292.60.1)

of which the relevant bit for this purpose is "/usr/lib/libedit.3.dylib",
pointing to the Apple-supplied version of libedit.  Maybe you see
something else?

regards, tom lane

—

Here’s what I get when I do "otool -L /usr/local/bin/psql";

/usr/local/bin/psql:
/usr/local/lib/libpq.5.dylib (compatibility version 5.0.0, current 
version 5.13.0)
/usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 
8.0.0, current version 8.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1292.0.0)

In other words, different from what you see. I'm an ordinary end user. I don't 
even think expllictly about “building" anything in the PostgreSQL system. I got 
into this mess (as I believe) because I did this:

brew update
brew upgrade

A colleague advised me to do this periodically as a hygiene measure. It had the 
surprising, and for me undesired, side-effect of upgrading my PostgreSQL 
installation from 12 to Version 13.1. I suppose that this triggered a build of 
some kind.

Re: psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn

On 09-Feb-2021, at 12:11, Tom Lane  wrote:

Bryn Llewellyn  writes:
> Here’s what I get when I do "otool -L /usr/local/bin/psql";

> /usr/local/bin/psql:
>   /usr/local/lib/libpq.5.dylib (compatibility version 5.0.0, current 
> version 5.13.0)
>   /usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 
> 8.0.0, current version 8.0.0)
>   /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
> version 1292.0.0)

Right, so that's using a version of libreadline that's supplied by
Homebrew (the /usr/local/opt path is the giveaway on that).

I don't know whether these things represent an intentional change
of libreadline's behavior in Homebrew's build, or a bug, but in
either case you should take the issue to the Homebrew support forums.
If it's intentional, I imagine there's a way to get the old behavior
back.

Also, libreadline is fairly configurable, so maybe this boils down
to some unintentional change in your ~/.inputrc ?

regards, tom lane

—

Thank you very much, Tom. It seems, then, that we have the “microscopic” 
explanation. I’ll have to to a fair bit of research to find out what to do to 
fix this problem.

Re: psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn



On 09-Feb-2021, at 12:49, Adrian Klaver  wrote:

On 2/9/21 12:19 PM, Bryn Llewellyn wrote:
> On 09-Feb-2021, at 12:11, Tom Lane  wrote:
> Bryn Llewellyn  writes:
>> Here’s what I get when I do "otool -L /usr/local/bin/psql";
>> /usr/local/bin/psql:
>>  /usr/local/lib/libpq.5.dylib (compatibility version 5.0.0, current 
>> version 5.13.0)
>>  /usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 
>> 8.0.0, current version 8.0.0)
>>  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
>> version 1292.0.0)
> Right, so that's using a version of libreadline that's supplied by
> Homebrew (the /usr/local/opt path is the giveaway on that).
> I don't know whether these things represent an intentional change
> of libreadline's behavior in Homebrew's build, or a bug, but in
> either case you should take the issue to the Homebrew support forums.
> If it's intentional, I imagine there's a way to get the old behavior
> back.
> Also, libreadline is fairly configurable, so maybe this boils down
> to some unintentional change in your ~/.inputrc ?
>   regards, tom lane
> —
> > Thank you very much, Tom. It seems, then, that we have the 
“microscopic” explanation. I’ll have to to a fair bit of research to find out 
what to do to fix this problem.

I would start here:

https://www.google.com/url?q=https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/postgresql.rb&source=gmail-imap&ust=161350858000&usg=AOvVaw2xLlZxQZO_RhB3g_4CE9Ol

and contact the maintainer:

https://www.google.com/url?q=https://github.com/MikeMcQuaid&source=gmail-imap&ust=161350858000&usg=AOvVaw3IqfXNHEiKGRMqR885C1yg

FYI, the formula points at another formula for readline:

def install
   ENV.prepend "LDFLAGS", "-L#{Formula["openssl@1.1"].opt_lib} 
-L#{Formula["readline"].opt_lib}"
   ENV.prepend "CPPFLAGS", "-I#{Formula["openssl@1.1"].opt_include} 
-I#{Formula["readline"].opt_include}"

which can be found here:

https://www.google.com/url?q=https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/readline.rb&source=gmail-imap&ust=161350858000&usg=AOvVaw0Ngt78hFNvEsPhgn0eFpIM


-- 
Adrian Klaver
adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>

—

Thanks for the suggestion, Adrian.

Re: psql behavior change on upgrade from version 12.x to 13.1

2021-02-09 Thread Bryn Llewellyn

On 09-Feb-2021, at 14:16, raf  wrote:

On Tue, Feb 09, 2021 at 12:19:21PM -0800, Bryn Llewellyn  
wrote:

> 
> 
> On 09-Feb-2021, at 12:11, Tom Lane  wrote:
> 
> Bryn Llewellyn  writes:
>> Here’s what I get when I do "otool -L /usr/local/bin/psql";
> 
>> /usr/local/bin/psql:
>>  /usr/local/lib/libpq.5.dylib (compatibility version 5.0.0, current 
>> version 5.13.0)
>>  /usr/local/opt/readline/lib/libreadline.8.dylib (compatibility version 
>> 8.0.0, current version 8.0.0)
>>  /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
>> version 1292.0.0)
> 
> Right, so that's using a version of libreadline that's supplied by
> Homebrew (the /usr/local/opt path is the giveaway on that).
> 
> I don't know whether these things represent an intentional change
> of libreadline's behavior in Homebrew's build, or a bug, but in
> either case you should take the issue to the Homebrew support forums.
> If it's intentional, I imagine there's a way to get the old behavior
> back.
> 
> Also, libreadline is fairly configurable, so maybe this boils down
> to some unintentional change in your ~/.inputrc ?
> 
>   regards, tom lane
> 
> —
> 
> Thank you very much, Tom. It seems, then, that we have the
> “microscopic” explanation. I’ll have to to a fair bit of research to
> find out what to do to fix this problem.

This sounds exactly like changes that happened in
debian a while ago. I was taken by surprise as well,
but it's actually much better behaviour than previous
behaviour. It's nice to know tht you have to confirm
the execution of a pasted shell command (especially
when pasting commands as root). It feels safer. You
might come to like it. But of course, the readline
library is probably configurable enough to change the
behaviour.

According to 
https://www.google.com/url?q=https://tiswww.case.edu/php/chet/readline/rluserman.html&source=gmail-imap&ust=161351379300&usg=AOvVaw2WcGHc2rFgrzBRyyoHH-Vk,
this could be what you're looking for:

 enable-bracketed-paste
 When set to `On', Readline will configure the
 terminal in a way that will enable it to insert each
 paste into the editing buffer as a single string of
 characters, instead of treating each character as if
 it had been read from the keyboard. This can prevent
 pasted characters from being interpreted as editing
 commands. The default is `On'.

So try putting this in your ~/.inputrc file:

 set enable-bracketed-paste off

cheers,
raf

——

Thanks, raf. I didn’t have a ~/.inputrc file. So I created one with the single 
line that you mentioned. It worked like a charm. Now life is back to normal.

Unexpected result from ALTER FUNCTION— looks like a bug

2022-04-19 Thread Bryn Llewellyn

SUMMARY

This part of the syntax diagram for "alter function":

ALTER FUNCTION name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] action 
[ … ]

says that the first "action" can be followed (without punctuation) by zero, 
one, or many other actions. A semantic rule says that no particular action can 
be specified more than once. My tests used these possible actions:

SECURITY { INVOKER |  DEFINER }
SET configuration_parameter TO value 
IMMUTABLE | STABLE | VOLATILE
PARALLEL { UNSAFE | RESTRICTED | SAFE }

The values of the properties set this way can be seen with a suitable query 
against "pg_catalog.pg_proc". (See the complete testcase below.) Suppose that 
the history of events shows this status for the function s1.f():

 name | type | security |proconfig  
  | volatility |  parallel  
--+--+--+-++
 f| func | invoker  |   
  | volatile   | unsafe   

This statement:

alter function s1.f()
security definer
immutable
parallel restricted;

brings this new status:

 name | type | security |proconfig  
  | volatility |  parallel  
--+--+--+-++
 f| func | definer  |   
  | immutable  | restricted

confirming that the three specified changes have been made using just a single 
"alter function" statement.

However, when "SET configuration_parameter" is specified along with other 
changes, then the "parallel" specification (but only this) is ignored. The 
other three specifications are honored.

alter function s1.f()
security invoker
set timezone = 'UTC'
stable
parallel safe;

It brings this new status:

 name | type | security |proconfig  
  | volatility |  parallel  
--+--+--+-++
 f| func | invoker  | {TimeZone=UTC}
  | stable | restricted

This is the bug.

Notice that with "alter procedure", the semantic difference between a procedure 
and a function means that you cannot specify "parallel" here, and so you can't 
demonstrate the bug here.

SELF-CONTAINED, RE-RUNNABLE TESTCASE tested using PG Version 14.1


-- demo.sql
---

\o spool.txt

\c postgres postgres
set client_min_messages = warning;
drop database if exists db;
create database db owner postgres;

\c db postgres
set client_min_messages = warning;
drop schema if exists public cascade;
create schema s1 authorization postgres;

\i prepare-qry.sql

create function s1.f()
  returns int
  language plpgsql
as $body$
begin
  return 0;
end;
$body$;

\t off
execute qry;

alter function s1.f()
security definer
immutable
parallel restricted;

\t on
execute qry;

-- Here is the bug. The test is meaningful only for a function.
alter function s1.f()
security invoker
set timezone = 'UTC'
stable
parallel safe;

execute qry;

\o


-- prepare-qry.sql
--

drop view if exists s1.subprograms cascade;
create view s1.subprograms(
  name,
  pronamespace,
  type,
  security,
  proconfig,
  volatility,
  parallel)
as
select
  proname::text as name,
  pronamespace::regnamespace::text,
  case prokind
when 'a' then 'agg'
when 'w' then 'window'
when 'p' then 'proc'
else 'func'
  end,
 case
when prosecdef then 'definer'
else 'invoker'
  end,
  coalesce(proconfig::text, '') as proconfig,
  case
when provolatile = 'i' then 'immutable'
when provolatile = 's' then 'stable'
when provolatile = 'v' then 'volatile'
  end,
  case
when proparallel = 'r' then 'restricted'
when proparallel = 's' then 'safe'
when proparallel = 'u' then 'unsafe'
  end
from pg_catalog.pg_proc
where
  proowner::regrole::text = 'postgres' and
  pronamespace::regnamespace::text = 's1' and
  pronargs = 0;

prepare qry as
select
  rpad(name,4) as name,
  rpad(type,4) as type,
  rpad(security,8) as security,
  rpad(proconfig,  55) as proconfig,
  rpad(volatility, 10) as volatility,
  rpad(parallel,   10) as parallel
from s1.subprograms
where type in ('func', 'proc')
and   pronamespace::regnamespace::text = 's1'
order by name;


spool.txt
-

 name | type | security |proconfig  
  | volatility |  parallel  
--+--+--+-++
 f| func | invoker  |

Re: Unexpected result from ALTER FUNCTION— looks like a bug

2022-04-20 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> david.g.johns...@gmail.com wrote:
>> 
>> Might I suggest the following...
> 
> Actually, the reason proconfig is handled differently is that it's a 
> variable-length field, so it can't be represented in the C struct that we 
> overlay onto the catalog tuple...

Thanks to all who responded. Tom also wrote this, earlier:

> In any case, Bryn's right, the combination of a SET clause and a PARALLEL 
> clause is implemented incorrectly in AlterFunction.

I'm taking what I've read in the responses to mean that the testcase I showed 
is considered to be evidence of a bug (i.e. there are no semantic restrictions) 
and that fix(es) are under consideration.

I agree that, as long as you know about the bug, it's trivial to achieve your 
intended effect using two successive "alter function" statements (underlining 
the fact that there are indeed no semantic restrictions). I hardly have to say 
that the point is the risk that you silently don't get what you ask for—and 
might then need a lot of effort (like I had to spend) to work out why.

"create function... depends on extension..." not supported. Why?

2022-04-26 Thread Bryn Llewellyn

Note: I’m asking because the answer to the question “Why isn’t X supported?” is 
always useful. For example, if supporting it would imply a logical 
impossibility that I’m too slow to spot, it helps me when someone explains what 
I failed to realize. Equally, it helps me to know when the answer is “It’s just 
a historical accident. It could have been supported. But, now that it isn’t, it 
doesn’t seem worth the effort to bridge that gap” because this shows me that my 
existing mental model is sound.

The background for my question here is that among the documented changes that 
you can make using “alter function”

https://www.postgresql.org/docs/current/sql-alterfunction.html

you can specify all of them at “create time” too (but possibly, like “owner”, 
only implicitly) except for “depends on extension”. When you know in advance 
that you want to set this property, you need two statements:

create function f()
 returns int
 language plpgsql
as $body$
begin
 return 42;
end;
$body$;

alter function f()
depends on extension pgcrypto;

Why is this not supported:

create function f()
 returns int
 language plpgsql
 depends on extension pgcrypto
as $body$
begin
 return 42;
end;
$body$;

Re: "create function... depends on extension..." not supported. Why?

2022-04-26 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Note: I’m asking because the answer to the question “Why isn’t X supported?” 
>> is always useful. For example, if supporting it would imply a logical 
>> impossibility that I’m too slow to spot, it helps me when someone explains 
>> what I failed to realize. Equally, it helps me to know when the answer is 
>> “It’s just a historical accident. It could have been supported. But, now 
>> that it isn’t, it doesn’t seem worth the effort to bridge that gap” because 
>> this shows me that my existing mental model is sound.
> 
> Here is the commit that brought the feature into existence (it includes a 
> link to the archives for discussion from which you can read or infer things).
> 
> https://github.com/postgres/postgres/commit/f2fcad27d59c8e5c48f8fa0a96c8355e40f24273

The discussion has diverging threads and very many turns. I think that I 
managed to skim through the entire tree. As I read it, the discussion was 
entirely about the semantics of the proposed dependency of a function (or 
procedure) upon an extension. The idea to establish such a dependency using 
“alter function” came up quite early in the discussion. It seems that 
establishing it at “create function” time was never considered.

Unless anybody corrects me, I’ll conclude that it’s perfectly feasible to 
establish the dependency at “create function” time. This would meet an obvious 
niceness goal (symmetry and guessability). It would also increase the 
possibility for component rule re-use in the presentation of the syntax rules. 
But I don’t s’pose that the effort of changing anything here would be 
cost-effective.

Re: "create function... depends on extension..." not supported. Why?

2022-04-26 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> The discussion has diverging threads and very many turns. I think that I 
>> managed to skim through the entire tree. As I read it, the discussion was 
>> entirely about the semantics of the proposed dependency of a function (or 
>> procedure) upon an extension. The idea to establish such a dependency using 
>> “alter function” came up quite early in the discussion. It seems that 
>> establishing it at “create function” time was never considered.
> 
> Probably not.  Just for the record, it's not true that CREATE and ALTER 
> always have the same set of options.  An obvious counterexample is that you 
> can't set the owner to someone different from yourself during CREATE. There 
> may be others.
> 
> I suppose that "DEPENDS ON EXTENSION" was modeled after the commands to 
> control extension membership, which likewise exist only in ALTER form because 
> CREATE's behavior for that is hard-wired.  If you wanted to hand-wave a lot, 
> you could maybe claim that ownership and extension membership/dependency are 
> similar kinds of relationships and so it makes sense that the command 
> structures for manipulating them are similar.  But TBH that would probably be 
> reverse-engineering an explanation.  I think that "we didn't bother" is more 
> nearly the situation.

Thanks, Tom. Just as I’d hoped, I found your “we didn’t bother” reply very 
helpful. I take your point about the inevitability of some differences between 
what “create function” and “alter function” can express.

B.t.w., in Oracle Database, you can create a schema object with any owner as 
long as you have the object-type-specific “ANY” privilege. (You need an 
ordinary object-type-specific privilege just to create objects that you own 
yourself.) The PG model is as different from the Oracle model as it could be in 
the general area of creating, altering, and dropping schema objects. I see that 
“alter” to change the owner only after the fact, requiring as it does a 
superuser, is [almost] a forced choice in PG.

It would seem, though, that syntax could be invented to allow a superuser to 
create an object of any type with any owner. But I s’pose that the usability 
benefit that this would bring would be marginal and it might even tempt bad 
practices.

Meaning of "constant" not honored when the variable is used as the actual for a proc's OUT formal paameter

2022-05-03 Thread Bryn Llewellyn

I just did this using PG 14.2:

create procedure p(a out int)
  language plpgsql
as $body$
begin
  a := 42;
end;
$body$;
  
do $body$
declare
  a constant int := 0;
begin
  call p(a);
  raise info '%', a::text;
end;
$body$;

The DO block runs without error and reports "INFO:  42". This is an unambiguous 
semantic error because "a" is declared "constant".

Is this a known issue?

Replace "a" with the literal "37" in this test:

do $body$
begin
  call p(37);
  raise info '%', a::text;
end;
$body$;

This causes the expected runtime error 42601:

procedure parameter "a" is an output parameter but corresponding argument is 
not writable.

Bt.w., error 42601 is mapped to the name "syntax_error" in PL/pgSQL. I'd say 
that this is its own distinct bug. The syntax is fine. It's a semantic error.

Notice that the test can be trivially transcribed to Oracle Database's PL/SQL 
as this SQL*Plus script:

create procedure p(a out integer)
  authid definer
as
begin
  a := 42;
end;
/
declare
  a /*constant*/ int := 0;
begin
  p(a);
  DBMS_Output.put_line('a: '||to_char(a));
end;
/

When "constant" is commented out (as presented), the anonymous block runs 
without error and outputs "a: 42". But when "constant" is uncommented, the 
attempt causes this error:

PLS-00363: expression 'A' cannot be used as an assignment target

This is the proper report of what clearly is a semantic error. PG should do the 
same.

B.t.w., this happens to be a compilation error in ORCL and not a run-time 
error. But that's an entirely different story and reflects the fundamentally 
different compilation and execution models for anonymous blocks, user-defined 
functions, and user-defined procedures between ORCL and PG.

Re: Meaning of "constant" not honored when the variable is used as the actual for a proc's OUT formal paameter

2022-05-03 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> The DO block runs without error and reports "INFO:  42". This is an 
>> unambiguous semantic error because "a" is declared "constant"… Is this a 
>> known issue?
> 
> It is, see 
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=ccd10a9bf

Thanks for the instant response, Tom. It's uncanny that you spotted this on 
Saturday 30-Apr and I spotted it just three days later on Tuesday 3-May. One of 
us must be psychic.

"A transaction cannot be ended inside a block with exception handlers."

2022-05-06 Thread Bryn Llewellyn

The PG doc section 43.8. Transaction Management:
https://www.postgresql.org/docs/current/plpgsql-transactions.html

says "A transaction cannot be ended inside a block with exception handlers." 
It's easy to demonstrate the restriction by adding this just before the final 
"end;" in the simple example at the start of the section:

exception
  when invalid_transaction_termination then
raise info 'invalid_transaction_termination caught';

The procedure now terminates with "invalid_transaction_termination caught".

The "Transaction Management" section is new, in the "PL/pgSQL - SQL Procedural 
Language" chapter, in Version 11. The caveat is there. And it's been there ever 
since. I'd been hoping that the restriction would have been lifted by version 
14.

I want to demonstrate how to meet this requirement:

«
Encapsulate each business function in a user-defined subprogram that hides all 
the implementation details like table names and the SQL statements that 
manipulate their contents so the they cannot be seen using SQL issued from the 
client. Further, don't allow raw errors to escape to the client. Rather, if an 
expected error occurs (like a unique key violation), then report this as an 
application-specific code that translates to, for example, "This nickname is 
already taken. Choose a different one." And if an "others" error occurs 
(typically because the programmer forgot to cater for it—like a too-wide 
varchar value) insert diagnostic info into an incident log table and return an 
"unexpected error" application-specific code together with the incident ID so 
that it can be reported to Support.
»

I've written proof-of-concept code that shows how to meet this requirement for 
most scenarios. But it seems to be impossible to meet the requirement for 
errors that occur at commit time. Such errors occur, for example, when two 
"serializable" sessions contend. It's easy to provoke a commit-time error in a 
single session demo with an "initially deferred" constraint trigger. (See the 
code at the end.) Another example would be an "initially deferred" FK 
constraint where an application code bug occasionally fails to meet the 
required conditions before the commit.

Is there simply no way that inserts into table "t" in my example can be 
encapsulated in PL/pgSQL so that the error from the failing trigger can be 
handled rather there than escaping, raw, to the client?


-- "initially deferred" constraint trigger demo

create table t(k int primary key);

create function trg_fn()
  returns trigger
  language plpgsql
as $body$
declare
  bad constant boolean not null :=
(select exists(select 1 from t where k = 42));
begin
  -- Uncomment to observe when trg fires.
  -- raise info 'trg fired';
  if bad then
assert 'false', 'tigger trg caused exception';
  end if;
  return null;
end;
$body$;

create constraint trigger trg
after insert on t
initially deferred
for each row
execute function trg_fn();


-- Do by hand, statement by statement.

-- OK
-- Notice that "trg" fire, five times, at "commit" time.
start transaction;
insert into t(k) values(1), (2), (3), (4), (5);
commit;

-- "ERROR: tigger trg caused exception" occurs at "commit" time.
start transaction;
insert into t(k) values(42);
commit;

Re: "A transaction cannot be ended inside a block with exception handlers."

2022-05-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Is there simply no way that inserts into table "t" in my example can be 
>> encapsulated in PL/pgSQL so that the error from the failing trigger can be 
>> handled rather there than escaping, raw, to the client?
> 
> Any potential solution to this problem will involve writing a stored 
> procedure (CREATE PROCEDURE) which becomes the API layer for the application 
> and each one probably issues a commit just prior to returning control to the 
> calling application.  Its exception handler can transform the at-commit 
> errors to application errors and then return control to the calling 
> application - which then needs to handle a clean return or an 
> application-level error return.

Eh? A stored procedure that has a commit like you suggest cannot have an 
exception handler like you also suggest. That's what the doc says. I quoted the 
sentence in my "subject" line. More carefully stated, if you have such a 
procedure, then any txn control statement that it executes will cause this:

ERROR:  2D000: cannot roll back while a subtransaction is active

Sure, you can handle this. But that gets you nowhere. The procedure will always 
end this way and never do what you wanted it to do. Sorry if my email wasn't 
clear.

I'll be delighted if somebody can show me working PL/pgSQL code that uses the 
setup that I showed and takes this as a starting point:

create procedure do_insert(good in boolean)
  language plpgsql
as $body$
begin
  case good
when true then
  for j in 10..20 loop
insert into t(k) values(j);
  end loop;
when false then
  insert into t(k) values(42);
  end case;
end;
$body$;

As presented, it ends like this when it's called with "false"

ERROR:  P0004: tigger trg caused exception
CONTEXT:  PL/pgSQL function trg_fn() line 9 at ASSERT
LOCATION:  exec_stmt_assert, pl_exec.c:3918

Modify it along the lines that David suggests so that when it's called with 
"true", it completes silently and makes the intended changes. And when it's 
called with "false", it reports that it handled the P0004 error via "raise 
info" and then returns without error.

The rest (presumably with an OUT parameter) is easy.

Re: "A transaction cannot be ended inside a block with exception handlers."

2022-05-06 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> I want to demonstrate how to meet this requirement:
>> 
>> «
>> Encapsulate each business function in a user-defined subprogram that hides 
>> all the implementation details like table names and the SQL statements that 
>> manipulate their contents so the they cannot be seen using SQL issued from 
>> the client. Further, don't allow raw errors to escape to the client. Rather, 
>> if an expected error occurs (like a unique key violation), then report this 
>> as an application-specific code that translates to, for example, "This 
>> nickname is already taken. Choose a different one." And if an "others" error 
>> occurs (typically because the programmer forgot to cater for it—like a 
>> too-wide varchar value) insert diagnostic info into an incident log table 
>> and return an "unexpected error" application-specific code together with the 
>> incident ID so that it can be reported to Support.
>> »
>> 
>> I've written proof-of-concept code that shows how to meet this requirement 
>> for most scenarios. But it seems to be impossible to meet the requirement 
>> for errors that occur at commit time.
> 
> So ... avoid those? It seems like it's only a problem if you use deferred 
> constraints, and that's not a necessary feature.
> 
>> Is there simply no way that inserts into table "t" in my example can be 
>> encapsulated in PL/pgSQL so that the error from the failing trigger can be 
>> handled rather there than escaping, raw, to the client?
> 
> If you want a strict view of that you probably need to be doing the 
> encapsulation on the client side. There's nothing you can do on the server 
> side that would prevent, say, network-connection failures "escaping" to the 
> client.  And that's actually one of the harder cases to deal with: if the 
> connection drops just after you issue COMMIT, you can't tell whether the 
> transaction got committed.

I'll take this to mean that there is no plan for PG ever to allow txn control 
in a PL/pgSQL block that has an exception handler. Please tell me if I 
misunderstood.

Your point about a certain class of server side error is well taken. (In Oracle 
Database, at least, errors like that cannot be handled in an exception block. 
They inevitably escape to the client. Is it the same in PG?

But I'm not convinced by this "what-about-ism" argument that it's pointless to 
handle those errors that allow it so that hackers get only the bare minimum 
information on things like schema-object names and the like. The more that 
hackers know about a system's internals, the better are their chances of doing 
evil.

You said "it's only a problem if you use deferred constraints, and that's not a 
necessary feature". My example was contrived. But some requirements (like 
entity level constraints) require commit-time checking. Like, say, a department 
must have just one or two staff whose job is 'Admin'. (This is presumably why 
the feature exists.) Using the serializable isolation level is another possible 
approach. But doing that can also lead to commit-time errors.

Is there really no sympathy for what I want to achieve?

The P0004 assert_failure exception assert_failure exception seems to be unhandleable

2022-05-08 Thread Bryn Llewellyn

I just stumbled on the fact that the "assert_failure" exception seems to be 
unhandleable. My test is at the end.

Is this intended?

I looked at the section:

«
43.9.2. Checking Assertions
https://www.postgresql.org/docs/current/plpgsql-errors-and-messages.html#PLPGSQL-STATEMENTS-ASSERT
»

It says this:

«
Note that ASSERT is meant for detecting program bugs, not for reporting 
ordinary error conditions. Use the RAISE statement, described above, for that.
»

But it takes quite a stretch of the imagination to infer that this means that 
the "assert_failure" exception cannot be handled.

B.t.w. this (in the same "43.9. Errors and Messages" chapter) looks like a typo:

«
If no condition name nor SQLSTATE is specified in a RAISE EXCEPTION command, 
the default is to use ERRCODE_RAISE_EXCEPTION (P0001).
»

The spelling "errcode_raise_exception()" makes it look like a built-in 
function. I believe that this is meant:

«
If no condition name nor SQLSTATE is specified in a RAISE EXCEPTION command, 
the outcome is as if this:

   ERRCODE = "RAISE_EXCEPTION"

or this:

   ERRCODE = "P0001"

was used
»

--
-- The test

create function demo_outcome(which in text)
  returns text
  language plpgsql
as $body$
declare
  err   text not null := '';
  msg   text not null := '';
  hint  text not null := '';
  n int not null := 0;
begin
  case which
when 'OK' then
  n := 42;

when 'null_value_not_allowed' then
  n := null;

when 'raise_exception' then
  raise exception using
errcode = 'raise_exception',
message = 'U1234: Not allowed!',
hint =  'Do something else!';

when 'assert_failure' then
  assert false, 'Assert failed';
  end case;
  return 'no error';
exception when others then
  get stacked diagnostics
err  = returned_sqlstate,
msg  = message_text,
hint = pg_exception_hint;
  return 'Handled: '||err||' | '||msg||' | '||hint;
end;
$body$;

\set VERBOSITY verbose
\t on
\o spool.txt
select demo_outcome('OK');
select demo_outcome('null_value_not_allowed');
select demo_outcome('raise_exception');
\o
\t off

It outputs this to "spool.txt".

 no error

 Handled: 22004 | null value cannot be assigned to variable "n" declared NOT 
NULL | 

 Handled: P0001 | U1234: Not allowed! | Do something else!

But doing this:

select demo_outcome('assert_failure');

causes this outcome:

ERROR:  P0004: Assert failed
CONTEXT:  PL/pgSQL function demo_outcome(text) line 22 at ASSERT
LOCATION:  exec_stmt_assert, pl_exec.c:3918

Re: "A transaction cannot be ended inside a block with exception handlers."

2022-05-08 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> I do understand better now and indeed the current limitation has no 
> workaround that I can come up with. I was hoping maybe subblocks would work 
> but its pretty clear cut that to catch an error at the commit command you 
> must catch it within a block and the commit error will be raised first.
> 
>> b...@yugabyte.com wrote:
>> 
>> I'll take this to mean that there is no plan for PG ever to allow txn 
>> control in a PL/pgSQL block that has an exception handler. Please tell me if 
>> I misunderstood.
> 
> You misunderstand how the development of PostgreSQL works generally. But, I 
> suppose as a practical matter if you aren't going to spearhead the change you 
> might as well assume it will not be possible until it is.
>  
>> Is there really no sympathy for what I want to achieve?
> 
> I personally have sympathy, and if you submitted a patch to improve matters 
> here I don't see anyone saying that it would be unwanted.
> 
> As for the circular dependency breaking use of deferred constraints (or your 
> explicitly deferred triggers), you have the SET CONSTRAINTS ALL IMMEDIATE 
> command:
> 
> postgres=# call do_insert(false);
> ERROR:  tigger trg caused exception
> CONTEXT:  PL/pgSQL function trg_fn() line 9 at ASSERT
> SQL statement "SET CONSTRAINTS ALL IMMEDIATE"
> PL/pgSQL function do_insert(boolean) line 12 at SQL statement
> postgres=# create or replace procedure do_insert(good in boolean)
>   language plpgsql
> as $body$
> begin
> begin
>   case good
> when true then
>   for j in 10..20 loop
> insert into t(k) values(j);
>   end loop;
> when false then
>   insert into t(k) values(42);
>   end case;
> SET CONSTRAINTS ALL IMMEDIATE;
> commit;
> end;
> exception
> when invalid_transaction_termination then
> raise exception 'caught invalid';
> when OTHERS then
> raise notice 'others - ok';
> commit;
> end;

Thanks for your "how the development of PostgreSQL works generally" comment. 
I've afraid that my full time job doesn't allow me time to study the PG 
codebase—and, anyway,  its more than thirty years since I regularly wrote C 
programs so I wouldn't be able now to write a patch for consideration.

To the present use-case…

I was thinking too narrowly and so I asked the wrong question (for my 
"initially deferred" constraint trigger example). I had carelessly assumed that 
such a trigger would fire only at "commit" time. But you reminded me if the 
"set constraints all immediate" option. Thanks.

I'd also used "assert" in my toy trigger to make it end with an error. But I've 
now realized that the exception that this causes is unhandleable. See my 
separate email to pgsql-general@lists.postgresql.org:

The P0004 assert_failure exception assert_failure exception seems to be 
unhandleable
https://www.postgresql.org/message-id/5a915380-789d-4448-89f4-e3f945c15...@yugabyte.com

I re-wrote my test, with a proper "raise exception" in the trigger and with 
"set constraints all immediate" in the PL/pgSQL block that can cause the 
trigger to fire. I also re-wrote the test as a function that returns the 
outcome status and allows more choices pf "case statement "legs". See below for 
the complete, self-contained, code.

It now works as I wanted to. So thank you very much, David, for showing me the 
technique that I needed to get that narrow example to work.

Now to your comment "the current limitation has no workaround that I can come 
up with". Tom has (I think) confirmed that there is no workaround. And I fear 
that the "set constraints all immediate" approach is unsafe without using 
"serializable". But when "serializable" is used, sometimes errors don't occur 
until commit time. Without using "serializable", the following race condition 
can occur. (Imagine changing the definition of "bad" in my example to:

(select exists(select 1 from t where k = 17))
and
(select exists(select 1 from t where k = 42))

- Two concurrent sessions both issue "set constraints all immediate" from a 
PL/pgSQL block as the last executable statement before the top-level PL/pgSQL 
call completes.

- One session inserts "17" and the other inserts "42".

- Each executes the constraint test before either commits—and so the test 
succeeds in each session. So the net effect of these changes violates the data 
rule that a trigger seeks to enforce. This is really just the same race 
condition that's used to show why entity-level constraints cannot be safely 
enforced by ordinary triggers that are not set "initially deferred".

So... in the bigger picture, when I want to honor the principle of modular 
software construction and hide all implementation details behind a PL/pgSQL 
API, I still have no general solution when, for example, entity level 
constraints are to be enforced.

It seems that what is needed is a commit-time event trigger. Could this be 
feasible?

Presumably, the mechanism must be at work at commit time, down in the 
implementation, with "initially de

Re: The P0004 assert_failure exception assert_failure exception seems to be unhandleable

2022-05-08 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> «
>> Note that ASSERT is meant for detecting program bugs, not for reporting 
>> ordinary error conditions. Use the RAISE statement, described above, for 
>> that.
>> »
>> 
>> But it takes quite a stretch of the imagination to infer that this means 
>> that the "assert_failure" exception cannot be handled.
> 
> Agreed.  But as the pl/pgsql section “trapping errors” notes:
> 
> “The special condition name OTHERS matches every error type except 
> QUERY_CANCELED and ASSERT_FAILURE. (It is possible, but often unwise, to trap 
> those two error types by name.)”
> 
> i.e.,  you must trap it explicitly, not as part of others.

Thanks again, David. I don't yet know my way around the overall PG doc well 
enough to make sure that I read everything that relates to my current interest. 
Thanks for reminding me about this:

43.6.8. Trapping Errors
https://www.postgresql.org/docs/14/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING

in the section:

43.6. Control Structures
https://www.postgresql.org/docs/14/plpgsql-control-structures.html

in chapter:

Chapter 43. PL/pgSQL — SQL Procedural Language
https://www.postgresql.org/docs/14/plpgsql.html

All is clear now. And the caveat "It is possible, but often unwise, to trap 
those two error types by name" makes sense.

Deferred constraint trigger semantics

2022-05-10 Thread Bryn Llewellyn

SUMMARY
===

I looked at the sections "CREATE TRIGGER" and "Chapter 39. Triggers" in the 
Current PG doc. But I failed to find any information about the semantics of the 
deferred constraint trigger or about the use cases that motivated this feature. 
Nor could I find any code examples. Internet Search turned up this 2019 post by 
Laurenz Albe's—but nothing else at all.

https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints/

(This is why I CC'd you, Laurenz.)

Laurenz described a use case that's best met by a SQL Assertion. But I don't 
know of a single available relational database system that supports this 
feature—though the SQL Standard defines it. (There's been talk of bringing SQL 
Assertion support in Oracle Database for several years. See 
https://community.oracle.com/tech/apps-infra/discussion/4390732/sql-assertions-declarative-multi-row-constraints
 
.
 But I don't know if any progress has been made.)

Laurenz's example treats the semantics as if the deferred constraint trigger 
fires exactly once, the moment before commit, so that it can check that the 
business rule is met and, if so, perform the commit before a concurrent session 
is allowed to commit a change that would, in such a race condition and together 
with the present session's commit, cause rule violation.

Is this a supported use—even though, as I show below, the trigger fires many 
times and produces the same result each time that it fires? And if so, why 
cannot the definition, optionally, be spelled "after... statement" and fire 
that way?

If the use that Laurenz's example shows is supported, then I could have such a 
trigger on every table that participates in a requirement that's tersely 
expressed using a SQL Assertion (as documentation). I'd simply use the same 
trigger function for the deferred constraint trigger on each of those tables. 
And I'd accept the fact that it unnecessarily (and expensively) fired more 
times than it needed to. Because I want to test only the final state, and 
intermediate states might violate the rule that I want to enforce, I need to 
add logic to populate a log (presumably a temporary table) with an ordinary 
trigger, every time any involved table changes, so that I could check the log 
in the constraint trigger's function. When it has N rows, then the function 
should skip the check except on its Nth invocation.

There's an implicit question lurking here: might it be possible to define a new 
ON COMMIT trigger firing point, legal only for a deferred constraint trigger? 
It should fire just once when a multi-statement transaction is committed if one 
or many of the tables, whose triggers share the same function, see changes.

DETAIL
==

I read this in the Current CREATE TRIGGER section: Constraint triggers must be 
AFTER ROW triggers on plain tables.

And sure enough, this attempt:

create constraint trigger trg
after insert on t
for each statement
execute function trg_fn();

causes a syntax error. This implies that the concept is to program a constraint 
that (like a declarative constraint on a table’s column as part of the table's 
definition) has just single-row scope. Yet I discovered, serendipitously (and 
before reading Laurenz's 's post), that an “after row” constraint trigger does 
allow querying the table it's on (and other tables) in the trigger function's 
body. I used this technique in the example that I used to start this thread:

"A transaction cannot be ended inside a block with exception handlers"
https://www.postgresql.org/message-id/40eb34e2-f9ca-4d29-b0be-6df3d4d0f...@yugabyte.com
 


Nobody said that I was doing anything unsupported.

I also used it in "The complete testcase" at the end of my final response in 
that thread:
https://www.postgresql.org/message-id/549eb7db-214e-420c-beef-324ab124d...@yugabyte.com
 


Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, is 
querying a trigger's base table in a "for each row" trigger fundamentally 
unsound and not supported? (In Oracle Database, it causes the notorious 
"mutating table" runtime error.)

This little test shows what actually happens:

create table t1(k serial primary key, v int not null);
create table t2(k serial primary key, v int not null);

create function trg_fn()
  returns trigger
  language plpgsql
as $body$
declare
  n int not null := 0;
begin
  n := (select count(*) from t1) + (select count(*) from t2);
  raise info 'trg fired. new.v = %, n = %', new.v, n;
  return new;
end;
$body$;

create constraint trigger trg
after insert on t1
for each row
execute function trg_fn();

create constraint trigger trg
after insert on t2
for each row
execute function trg_fn();

set default

Re: Deferred constraint trigger semantics

2022-05-10 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> INFO:  trg fired. new.v = 80, n = 8
>> 
>> It shows the expected "new" value each time it fires. And yet the query 
>> reflects the table content on statement completion. This seems to be very 
>> strange.
> 
> From the documentation:
> 
> "Row-level BEFORE triggers fire immediately before a particular row is 
> operated on, while row-level AFTER triggers fire at the end of the statement 
> (but before any statement-level AFTER triggers)."
> 
> https://www.postgresql.org/docs/current/trigger-definition.html

Thanks, David. Those sentences were too deeply buried, in a page with no 
subsections, for me to spot by ordinary searching in the page. The term 
"Row-level BEFORE" trigger subverted my search for "BEFORE EACH ROW" 
trigger—which is the term that I'm used to. This is another lesson for me to 
read every word in what looks like relevant doc, from start to finish, like a 
book. I'm chastened.

The next section, "39.2. Visibility of Data Changes"
https://www.postgresql.org/docs/current/trigger-datachanges.html

says this:

«
* Statement-level triggers follow simple visibility rules: none of the changes 
made by a statement are visible to statement-level BEFORE triggers, whereas all 
modifications are visible to statement-level AFTER triggers.

* The data change (insertion, update, or deletion) causing the trigger to fire 
is naturally not visible to SQL commands executed in a row-level BEFORE 
trigger, because it hasn't happened yet.

* However, SQL commands executed in a row-level BEFORE trigger will see the 
effects of data changes for rows previously processed in the same outer 
command. This requires caution, since the ordering of these change events is 
not in general predictable; an SQL command that affects multiple rows can visit 
the rows in any order.
»

Strangely, the wording here, explicit as it is, makes no mention of what you 
might expect to see in an AFTER EACH ROW trigger. It's a bit of a stretch to 
put the sentences from the previous section that you quoted together with these 
three bullets to conclude this: querying the trigger's base-table's content 
*is* allowed from the trigger's function for all of the five timing points: 
BEFORE and AFTER EACH ROW (not deferred), BEFORE and AFTER EACH STATEMENT  (not 
deferred), and AFTER EACH ROW (deferred to commit time) is indeed supported. 
I'll take this to be the case unless anybody contradicts me.

The mention of unpredictable results in the third bullet in the BEFORE case 
implies that there's no such unpredictability in the AFTER EACH ROW cases. But 
there has to be a subtle caveat here for the deferred constraint trigger when 
the txn changes two or more tables, all of which participate in the query that 
the trigger function issues. I'll assume that you (all) know what I mean. The 
"raise info" output below illustrates my point (n changes from 5 to 8). But 
this seems to be sound inference from the rules that were stated. I'll take 
this, too, to be the case unless anybody contradicts me.

I assume, though, that considering this output that I showed in my original 
mail:

INFO:  trg fired. new.v = 10, n = 5
INFO:  trg fired. new.v = 20, n = 5
INFO:  trg fired. new.v = 30, n = 5
INFO:  trg fired. new.v = 40, n = 5
INFO:  trg fired. new.v = 50, n = 5
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

the actual order in which I see the "raise info" output is unpredictable while 
the values shown in each *are* predictable. Can I rely on this rule?

Is there a fundamental reason why a deferred AFTER EACH STATEMENT constraint 
trigger is not allowed? Nothing in what is explained in the "Overview of 
Trigger Behavior" and "Visibility of Data Changes" sections lets me see why the 
present restriction is needed.

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:

Thanks for the point-by-point reply, David.

>> ...makes no mention of what you might expect to see in an AFTER EACH ROW 
>> trigger.
> 
> ...the absence of a comment declaring a guarantee of order means that, like 
> the comment for the row-level BEFORE trigger, the row-level AFTER row 
> ordering is not guaranteed (even if one cannot produce a counter-example in 
> today's codebase).

Got it!

>> ...unless anybody contradicts me.
> 
> Caveat emptor...? I wouldn't be surprised that doing so is technically 
> possible in all cases - as to whether a particular algorithm is sound, to 
> some extent, isn't something we try to predict. We do try to describe all the 
> known interactions though - and let the user work within what those mean for 
> them.

Got it again!

>> ...implies that there's no such unpredictability in the AFTER EACH ROW cases.
> 
> I would not read it that way. In general, absence of mention of 
> predictability like this means there is none - that some other sentence goes 
> into more detail doesn't change that.

OK.

>> But there has to be a subtle caveat here for the deferred constraint trigger 
>> when the txn changes two or more tables, all of which participate in the 
>> query that the trigger function issues… The "raise info" output below 
>> illustrates my point (n changes from 5 to 8).
> 
> I'm failing to see the deferral aspect of that example. First statement 
> finishes, sees the 5 inserts, next statement finishes, sees 3 more inserts. 
> Not, both statements finish, triggers fire, triggers see all 8 inserts (which 
> I suspect they will if you actually perform deferral).

Oops. I did a copy-and-paste error on going from my test env. to email and 
missed out the "deferral" that I'd intended. For completeness, here's the test 
that I meant:

create table t1(k serial primary key, v int not null);
create table t2(k serial primary key, v int not null);

create function trg_fn()
  returns trigger
  language plpgsql
as $body$
declare
  n int not null := 0;
begin
  n := (select count(*) from t1) + (select count(*) from t2);
  raise info 'trg fired. new.v = %, n = %', new.v, n;
  return new;
end;
$body$;

create constraint trigger trg
after insert on t1
for each row
execute function trg_fn();

create constraint trigger trg
after insert on t2
initially deferred
for each row
execute function trg_fn();

set default_transaction_isolation = 'read committed';
do $body$
begin
  insert into t1(v)
  values (10), (20), (30), (40), (50);

  insert into t2(v)
  values (60), (70), (80);
end;
$body$;

It adds the "initially deferred" decoration to the "create constraint trigger" 
statement. This is (still) the result:

INFO:  trg fired. new.v = 10, n = 5
INFO:  trg fired. new.v = 20, n = 5
INFO:  trg fired. new.v = 30, n = 5
INFO:  trg fired. new.v = 40, n = 5
INFO:  trg fired. new.v = 50, n = 5
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

Even though both inserts have completed by commit time, only the trigger firing 
caused by the second statement sees the final state that obtains the moment 
before commit. The first statement sees only the state after it finishes and 
before the second statement executes. You said « I suspect [that both 
statements will see the final state] if you actually perform deferral ». My 
test shows that this is not the case.

Did I misunderstand you? Or does this result surprise you? If it does, do you 
think that this is a bug?

> ...Your test case for the deferred constraint, that supposedly allows for the 
> insertion of invalid data per the specification of the constraint trigger, 
> isn't something I've worked through yet; and as written reads like a bug 
> report.

It's not a report of a PG bug. Rather, it shows how an application programmer 
might write a bug in *their* code. When "set constraints all immediate" is 
used, it opens a race condition window between its execution and the commit. 
I'd speculated on that earlier. So I felt that I should show a self-contained 
demo of this possible trap for the application programmer. As long as  "set 
constraints all immediate" is not used, the demo shows proper behavior. Of 
course, it moves the constraint violation error to commit time—and this means 
that PL/pgSQL code cannot handle it (as discussed at length elsewhere).

I appreciate that using  "set constraints all immediate" is useful in many 
scenarios to allow handling the violation error in PL/pgSQL. (Thanks again for 
that tip, David.) For example, and as I reason it, the famous "mandatory 
one-to-one relationship" use-case is safe when you use this technique. This 
use-case needs mutual FK constraints between the two tables. But you have to 
insert one of the two rows (call it "main") that are so related before the 
other (call it "extra"). And so the mutual FK constraints cannot be satisfied 
until both new rows are in place. Th

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn

> laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …Internet Search turned up this 2019 post by Laurenz Albe—but nothing else 
>> at all.
>> 
>> https://www.cybertec-postgresql.com/en/triggers-to-enforce-constraints 
>> 
>> 
>> (This is why I CC'd you, Laurenz.)
> 
> So I guess I should answer.

Thanks for replying to my original post with this subject line, Laurenz. Sorry 
to bring up notions that you wrote about three years ago. I judged that, 
because I referred to those notions, it would be polite to copy you—especially 
because I interpreted (maybe incorrectly) what you had written.

> About the starting paragraph of your mail: Constraint triggers are a 
> syntactic leftover from the way that triggers are implemented in PostgreSQL. 
> There is different syntax now, but it was decided to leave constraint 
> triggers, since they may have some use.

If constraint triggers are, in the hands of an informed application programmer, 
to have some sound uses, then the semantics must be clearly defined. And you do 
say that they are supported. David Johnson argues that, as long as you read the 
right snippets from various parts of the doc and synthesize their joint 
meaning, then the semantics are defined. Yes, David, I can accept that—with a 
loud caveat about the answer to my (a) or (b) question below.

I re-read the penultimate paragraph in Laurenz's post:

«
By making the trigger INITIALLY DEFERRED, we tell PostgreSQL to check the 
condition at COMMIT time.
»

I have always understood that (in Postgres and any respectable RDBMS) commits 
in a multi-session environment are always strictly serialized—irrespective of 
the transaction's isolation level. Am I correct to assume this is the case for 
Postgres? I took "at COMMIT time" to mean "as part of the strictly serialized 
operations that implement a session's COMMIT". But I see (now) that you argue 
that this is not the case, thus:

«
This will reduce the window for the race condition a little, but the problem is 
still there. If concurrent transactions run the trigger function at the same 
time, they won’t see each other’s modifications.
»

I take what you say in your post to mean that each session executes its 
deferred constraint check (by extension, not just for constraint triggers but 
for all deferred constraint cases) momentarily *before* COMMIT so that the 
effect is only to reduce the duration of the race condition window rather than 
to eliminate it.

So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
current CREATE TABLE doc says this:

«
If the constraint is INITIALLY DEFERRED, it is checked only at the end of the 
transaction.
»

The wording "at the end of the transaction" is not precise enough to 
adjudicate—and so the key question remains: Is a deferred constraint checked:

(a) as part of the strictly serialized operations that implement a session's 
COMMIT?

or

(b) momentarily *before* COMMIT and not within the serialized COMMIT execution?

So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), why? 
After all, (b) brings the race condition risk. Is (a) simply not feasible?

> [Lots of ruminations and wandering throughts]
> 
> Sorry, that was too much for me to comment on - that would require a 
> mid-sized article.

Oh… I'm sorry to hear that I rambled and lost clarity. I find it impossible to 
say what I want to, striving for precision, without sacrificing brevity. I 
always find that I can improve my wording with successive reads. But life is 
short and I have, eventually, just to publish and be damned.

>> Is my entire concept (and Laurenz's too) fundamentally flawed? Specifically, 
>> is querying a trigger's base table in a "for each row" trigger fundamentally 
>> unsound and not supported?
> 
> My post claims that constraint triggers alone are *not* a sufficient solution 
> to validate constraints - you need additional locking or SERIALIZABLE 
> isolation to make that work reliably.

This doesn't seem to be what you wrote. These two headings

> Solving the problem with “optimistic locking” (which you explain means using 
> SERIALIZABLE)

and

> What about these “constraint triggers”?

read as if they are orthogonal schemes where the constraint trigger approach 
does not rely on SERIALIZABLE.

As I reason it, if you use the SERIALIZABLE approach, then an ordinary 
immediate AFTER EACH STATEMENT trigger will work fine—precisely because of how 
that isolation level is defined. So here, a deferred constraint trigger isn't 
needed and brings no value.

This implies that if a deferred constraint trigger is to have any utility, it 
must be safe to use it (as I tested it) at the READ COMMITTED level. I do see 
that, though I appear to be testing this, I cannot do a reliable test because I 
cannot, in application code, open up, and exploit, a race condition window 
after COMMIT has been issued. (I *am* able

Re: Deferred constraint trigger semantics

2022-05-11 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Oops. I did a copy-and-paste error on going from my test env. to email and 
>> missed out the "deferral" that I'd intended. For completeness, here's the 
>> test that I meant:
>> 
>> create constraint trigger trg
>> after insert on t1
>> for each row
>> execute function trg_fn();
>> 
>> create constraint trigger trg
>> after insert on t2
>> initially deferred
>> for each row
>> execute function trg_fn();
>> 
>> It adds the "initially deferred" decoration to the "create constraint 
>> trigger" statement. This is (still) the result:
> 
> You only added it to the uninteresting trigger on t2.  It's the t1 trigger 
> where I'd expect the behavior to change.  I'm assuming your test does both 
> (not in a position to test it myself at the moment).

Damn. I'm horrified that, despite my best intentions, I still managed to do a 
typo. How embarrassing… With the correction in place, I now get this output:

INFO:  trg fired. new.v = 10, n = 8
INFO:  trg fired. new.v = 20, n = 8
INFO:  trg fired. new.v = 30, n = 8
INFO:  trg fired. new.v = 40, n = 8
INFO:  trg fired. new.v = 50, n = 8
INFO:  trg fired. new.v = 60, n = 8
INFO:  trg fired. new.v = 70, n = 8
INFO:  trg fired. new.v = 80, n = 8

This is exactly what you predicted. I'm delighted (finally) to see this outcome.

>> [What I wrote here was rubbish, given that my test code was not what I 
>> claimed it was.]
> 
> [David's response here is now moot.]
>  
>> With respect to « having to keep around a working set of what are the 
>> changed records » I think that the complexity that you envisaged is avoided 
>> by the (emergent) rule that an AFTER EACH STATEMENT trigger cannot see "old" 
>> and "new" values. In other words, all you can sensibly do in its function is 
>> ordinary SQL that sees the current state at the moment it fires.
>> 
>> To my surprise, it *is* legal to write code that accesses "old" and "new" 
>> values. But, because many rows can be affected by a single statement, and 
>> the trigger fires just once, the meanings of "old" and "new" are undefined. 
>> I've seen that, in any test that I do, both are always set to NULL (which 
>> seems reasonable).
> 
> I was thinking more about transition tables - though I admit it's not a 
> complete thought given their opt-in nature.

Ah… I hadn't considered transition tables. However, they don't seem to be 
valuable for a constraint trigger. So your concern could be removed at a stroke 
by adding a semantic rule to the account of the CREATE TRIGGER syntax 
specification thus:

« The SQL that creates a statement-level AFTER constraint trigger may not 
specify using transition tables. »

Might this device give me hope?

Finally, it seems that a not deferrable constraint trigger has no value—except 
in that using the keyword CONSTRAINT is a nice bit of self-documentation. Did I 
miss something? Is this trigger flavor valuable for a use-case that I haven't 
spotted?

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> adrian.kla...@aklaver.com wrote:
> 
>> neerajmr12...@gmail.com wrote:
>> 
>> Is there anyway that we can restrict a user from seeing the schema 
>> structure. I can restrict the user from accessing the data in the schema but 
>> the user is still able to see the table names and what all columns are 
>> present in them.
> 
> No.

Here’s something that you can do, Neeraj. But you have to design your app this 
way from the start. It'd be hard to retrofit without a re-write.

Design (and document the practice) to encapsulate the database functionality 
(i.e. the business functions that the client side app must perform) behind an 
API exposed as user-defined functions that return query results for SELECT 
operations and outcome statuses (e.g. "success", "This nickname is taken. Try a 
different one", "Unexpected error. Report incident ID N to Support"). JSON 
is a convenient representation for all possible return values.

Use a regime of users, schemas, and privilege grants (functions having 
"security definer" mode) to implement the functionality. Create a dedicated 
user-and-schema to expose the API and nothing else. This will own only 
functions that are thin jackets to invoke the real work-doing functions that 
are hidden from the client. Allow clients to authorize ONLY as the API-owning 
user.  Grant "execute" on its functions to what's needed elsewhere.

I've prototyped this scheme. It seems to work as designed. A client that 
connects with psql (or any other tool) can list the API functions and whatever 
\df and \sf show. (notice that \d will reveal nothing.)But doing this reveals 
only the names of the functions that are called (which will be identical to the 
jacket names—so no risk here) and the name(s) of the schema(s) where they live 
(so a minor theoretical risk here). 

Full disclosure: I've never done this in anger.

Note: I believe this approach to be nothing other than the application of the 
time-honored principles (decades old) of modular software construction (where 
the entire database is a top-level module in the over all app's decomposition). 
It brings the security benefit that I sketched along with all the other famous 
benefits of modular programming—esp. e.g. the client is shielded from table 
design changes.

Re: Deferred constraint trigger semantics

2022-05-12 Thread Bryn Llewellyn

> postgres.ro...@gmail.com wrote:
> 
> It adds the "initially deferred" decoration to the "create constraint 
> trigger" statement. This is (still) the result:
> 
> INFO:  trg fired. new.v = 10, n = 5
> INFO:  trg fired. new.v = 20, n = 5
> INFO:  trg fired. new.v = 30, n = 5
> INFO:  trg fired. new.v = 40, n = 5
> INFO:  trg fired. new.v = 50, n = 5
> INFO:  trg fired. new.v = 60, n = 8
> INFO:  trg fired. new.v = 70, n = 8
> INFO:  trg fired. new.v = 80, n = 8
> Because You can do 
> create constraint trigger trg
> after insert on t2
> deferrable initially deferred
> for each row
> execute function trg_fn();
> 
> You didn't explicitly defer the trigger trg on t1!. That means after you 
> insert on t1 then the trigger trg on t1 invoked rather than on commit time.  
> If you 
> create constraint trigger trg
> after insert on t1
> deferrable initially deferred
> for each row
> execute function trg_fn();
> 
> create constraint trigger trg
> after insert on t2
> deferrable initially deferred
> for each row
> execute function trg_fn();
> then you will get 
> INFO:  0: trg fired. new.v = 10, n = 8
> INFO:  0: trg fired. new.v = 20, n = 8
> INFO:  0: trg fired. new.v = 30, n = 8
> INFO:  0: trg fired. new.v = 40, n = 8
> INFO:  0: trg fired. new.v = 50, n = 8
> INFO:  0: trg fired. new.v = 60, n = 8
> INFO:  0: trg fired. new.v = 70, n = 8
> INFO:  0: trg fired. new.v = 80, n = 8 

Er… yes. David Johnston pointed that out too. I'm embarrassed beyond belief. 
Sorry to have wasted folks' time because of my mistake.

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> adrian.kla...@aklaver.com wrote:
>> 
>>> b...@yugabyte.com wrote:
>>> 
>>> I've prototyped this scheme. It seems to work as designed. A client that 
>>> connects with psql (or any other tool) can list the API functions and 
>>> whatever \df and \sf show. (notice that \d will reveal nothing.)But doing 
>>> this reveals only the names of the functions that are called (which will be 
>>> identical to the jacket names—so no risk here) and the name(s) of the 
>>> schema(s) where they live (so a minor theoretical risk here).
>>> 
>>> Full disclosure: I've never done this in anger.
>> 
>> Try select * from pg_class or select * from pg_attribute or any of the other 
>> system catalogs.
> 
> Which is exactly what most GUI applications that provide object browsing and 
> viewing are going to use.

Oops. I made the unforgivable mistake of saying something without first having 
run a script to demonstrate what I'd planned to say. I'm embarrassed (again). I 
confused my memory of the proof-of-concept demo that I'd coded in PG with what, 
back in the day, I'd coded in Oracle Database. (The visibility notions in ORCL 
are very much more granular than in PG.)

I re-coded and re-ran my PG proof-of-concept demo. It creates a dedicated 
database "app" and dedicated users "data", "code", and "api" to own the 
application objects, each in a schema with the same name as the owning user. 
These have the purposes that their names suggest. As it progresses, it creates 
the table "data.t", the function "code.f", and the function "api.f" (as a 
minimal jacket to invoke "code.f"). Finally, it creates the user "client" with 
no schema but with "usage" on the schema "api" and "execute" on (in general) 
each of its functions. The idea is that "client" has been explicitly given only 
the privileges that are necessary to expose the functionality that has been 
designed for use by connecting client sessions.

When the setup is done, and when connected as "client". it runs a UNION query 
using "pg_class", "pg_proc", and "pg_namespace". I restricted it to exclude all 
the owned by the installation (in my case, an MacOS, "Bllewell").

As you'd all expect, this is the result:

 owner | schema_name | object_kind | object_name 
---+-+-+-
 api   | api | function| f
 code  | code| function| f
 data  | data| index   | t_pkey
 data  | data| sequence| t_k_seq
 data  | data| table   | t

Without the restriction, and again as you'd all expect, the query shows every 
single schema object in the entire database. Other queries show all the users 
in the cluster. Queries like the ones I used here allow "\d", "\df", and the 
like to show lots of the facts about each kind of object in the entire 
database. And, yes, I did know this.

However, the design decision that, way back when, leads to this outcome does 
surprise me. The principle of least privilege insists that (in the database 
regime) you can create users that can do exactly and only what they need to do. 
This implies that my "client" should not be able to list all the objects in the 
database (and all the users in the cluster).

Here's what the exercise taught me: When connected in psql as "client", and 
with "\set VERBOSITY verbose", this:

select * from data.t;

causes this expected error:

ERROR:  42501: permission denied for schema data

But this:

sf code.f

causes this unexpectedly spelled error (with no error code):

ERROR:  permission denied for schema code

Nevertheless, this:

select pg_catalog.pg_get_functiondef((
  select p.oid
  from pg_catalog.pg_proc p
  join pg_catalog.pg_namespace n
  on p.pronamespace = n.oid
  where
p.proowner::regrole::text = 'code'  and
n.nspname::text = 'code' and
p.prokind = 'f' and
p.proname::text = 'f'
  ));

sidesteps the check that "\sf" uses, runs without error and produces this 
result:

 CREATE OR REPLACE FUNCTION code.f()+
  RETURNS integer   +
  LANGUAGE plpgsql  +
  SECURITY DEFINER  +
 AS $function$  +
 begin  +
   return (select count(*) from data.t);+
 end;   +
 $function$ +

So it seems that the implementation of "\sf" adds its own ad hoc privilege 
checks and, when needed, outputs an error message that its own code generates. 
Strange.

I see now that my quest to handle, and sanitize, unexpected errors in PL/pgSQL 
exception sections has only rather limited value. It can aid usability, for 
example by changing "unique_violation" (with all sorts of stuff about line 
numbers and the like) to "This nickname is taken". However, in the case of 
errors like this:

22001: value too long for type character varying(8)

while again the sanitized "Nickname must be no more than eight characters" is 
nice, it doesn't prevent

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> However, the design decision that, way back when, leads to this outcome does 
>> surprise me. The principle of least privilege insists that (in the database 
>> regime) you can create users that can do exactly and only what they need to 
>> do. This implies that my "client" should not be able to list all the objects 
>> in the database (and all the users in the cluster).
> 
> If there was any motivation to improve PostgreSQL on this front I'd like them 
> to start with "routine bodies" being hidden away from inspection. I'm much 
> less concerned about pg_class or even knowing the names of things.
> 
> This has been discussed a number of times, probably every few years or so. My 
> quick search failed to find any relevant links/threads in the archives, 
> though I didn't try that hard.

Thanks (again) David. Yes, there is an argument that when app developers know 
that hackers can read every minute detail of their implementation (but, with a 
sound user/schema/privileges discipline cannot change any of this), it cautions 
them to be extra scrupulous. SQL injection is maybe a good example. It's 
probably easier and quicker to scan PL/pgSQL source code looking for obvious 
patterns (like "any use of dynamic SQL?", "If yes, any concatenation of 
literals into the to-be-executed statement?", and so on) than it is to send 
robotically generated values via browser-UI screens in the hope of provoking 
tell-tale errors.

It certainly helps to know that nothing in how PG works in the space that's 
relevant here is going to change in my lifetime.

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com writes:
>> 
>> Maybe this entire discussion is moot when hackers can read the C code of 
>> PG's implementation…
> 
> We have pretty much no interest in revisiting that design choice, even if 
> doing so wouldn't likely break a couple decades' worth of client-side 
> software development.
> 
> Anyway, if you feel a need to prevent user A from seeing the catalog entries 
> for user B's database objects, the only answer we have is to put A and B into 
> separate databases.  If despite that you want A and B to be able to share 
> some data, you can probably build the connections you need using foreign 
> tables or logical replication; but there's not a lot of pre-fab 
> infrastructure for that, AFAIK.

Thanks Tom. It certainly helps to know that nothing in how PG works in the 
space that's relevant here is going to change in my lifetime. (I just wrote 
exactly the same in reply to David Johnston.)

My sense is that the database is intended to be a hermetic unit of 
encapsulation and provides some of the features that multi-tenancy requires. 
But there's the caveat that users are defined, and operate, cluster-wide.

If a cluster has two databases, "app_1" and "app_2", each populated using the 
general scheme that I sketched, then users "client_1" and "client_2" (designed, 
respectively to let them operate as intended in their corresponding databases) 
could always connect each to the other's database. They couldn't do much in the 
"wrong" database. But they could certainly list out all the application's 
objects and the source code of all the application's user-defined subprograms.

In general, it's best to use any system in the way that it was designed to be 
used. And PG was designed to allow all users to see the metadata account of all 
of the content of every database in the cluster—but not to use any of the 
content unless this has been specifically catered for.

Re: Deferred constraint trigger semantics

2022-05-12 Thread Bryn Llewellyn

> laurenz.a...@cybertec.at wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …I have always understood that (in Postgres and any respectable RDBMS) 
>> commits in a multi-session environment are always strictly 
>> serialized—irrespective of the transaction's isolation level. Am I correct 
>> to assume this is the case for Postgres? I took "at COMMIT time" to mean "as 
>> part of the strictly serialized operations that implement a session's 
>> COMMIT".
> 
> I am not sure what you mean by serialized commits. Transactions are 
> concurrent, and so are commits. COMMIT takes some time, during which several 
> things happen, among them executing deferred constraints, writing a WAL 
> record and flushing the WAL. The only thing that is necessarily serialized is 
> writing the WAL record.

Oh. I was wrong, then. I'll say more on this below.

>> …I take what you say in your post to mean that each session executes its 
>> deferred constraint check (by extension, not just for constraint triggers 
>> but for all deferred constraint cases) momentarily *before* COMMIT so that 
>> the effect is only to reduce the duration of the race condition window 
>> rather than to eliminate it.
> 
> In the case of constraint triggers, yes. But there is no race condition for 
> primary key, unique and foreign key constraints, because they also "see" 
> uncommitted data.

I can't follow you here, sorry. I tried this:

create table t(
  k serial primary key,
  v int not null,
  constraint t_v_unq unique(v) initially deferred);

-- RED
start transaction isolation level read committed;
insert into t(v) values (1), (2);
select k, v from t order by k;

-- BLUE
start transaction isolation level read committed;
insert into t(v) values (1), (3);
select k, v from t order by k;

-- RED
commit;
select k, v from t order by k;

-- BLUE
select k, v from t order by k;
commit;
select k, v from t order by k;

The first "select" from the "BLUE" session at the very end produces this:

 k | v 
---+---
 1 | 1
 2 | 2
 3 | 1
 4 | 3

This doesn't surprise me. It's ultimately illegal. But not yet. (Before "RED" 
committed, "BLUE" didn't see the rows with "k = 1" and "k = 2". So it isn't 
seeing any other sessions uncommitted data—but only it's own uncommitted data.)

Then, when "BLUE" commits, it (of course) gets this:

ERROR:  duplicate key value violates unique constraint "t_v_unq"
DETAIL:  Key (v)=(1) already exists.

Then it sees (of course, again) only the rows with "k = 1" and "k = 2"—the same 
as what "RED" saw.

It seems to be impossible to do a test in slow motion where "RED" and "BLUE" 
each issues "commit" at the exact same moment. So thinking about this scenario 
doesn't tell me if:

(a) Each session runs its constraint check and the rest of what "commit" 
entails in a genuinely serialized fashion.

OR

(b) Each session first runs its constraint check (and some other stuff) 
non-serializedly—and only then runs the small part of the total "commit" action 
(the WAL part) serializedly. (This would result in bad data in the database at 
rest—just as my contrived misuse of "set constraints all immediate" left things 
in my "one or two admins" scenario.)

I appreciate that this just is a different wording of what I wrote before—but 
now w.r.t. the system-implemented unique constraint use-case.

The (a) scheme sounds correct. And the (b) scheme sounds wrong. Why would PG 
prefer to implement (b) rather than (a)?

I'm clearly missing something.

>> So it all depends on a lawyerly reading of the wording "at COMMIT time". The 
>> current CREATE TABLE doc says this:
>> 
>> «
>> If the constraint is INITIALLY DEFERRED, it is checked only at the end of 
>> the transaction.
>> »
>> 
>> The wording "at the end of the transaction" is not precise enough to 
>> adjudicate—and so the key question remains: Is a deferred constraint checked:
>> 
>> (a) as part of the strictly serialized operations that implement a session's 
>> COMMIT?
>> 
>> or
>> 
>> (b) momentarily *before* COMMIT and not within the serialized COMMIT 
>> execution?
>> 
>> So… (asking the wider audience) is the answer (a) or (b)? An if it's (b), 
>> why? After all, (b) brings the race condition risk. Is (a) simply not 
>> feasible?
> 
> COMMITs are not serialized. You seem to think that as soon as one 
> transaction's COMMIT starts processing, no other transaction may COMMIT at 
> the same time. That is not the case.

Yes, I most certainly did think this.

Where, in the PG doc, can I read the account of the proper mental model for the 
application programmer? It seems to be impossible to conduct an experiment that 
would disprove the hypothesis that one, or the other, of these mental models is 
correct.

 Is my entire concept (and Laurenz's too) fundamentally flawed? 
 Specifically, is querying a trigger's base table in a "for each row" 
 trigger fundamentally unsound and not supported?
>>> 
>>> My post claims that constraint triggers alone are *not* a sufficient 
>>> solution to validate co

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> neerajmr12...@gmail.com wrote:
> 
> I am using pgAdmin . I have a database 'db' and it has got 2 schemas 
> 'schema1' and 'schema2', I have created some views in schema2  from tables of 
> schema1. I have created a new user and granted connection access to database 
> and granted usage on tables and views of schema2 only. But now the problem is 
> that the new user is able to see the table names of schema1 even though the 
> user cannot see the data present in them they can see the table names. Is 
> there any way I can completely hide schema1 from the new user.

What exactly do you mean by "have created a new user and granted connection 
access to database"? As I understand it, there's no such thing. I mentioned a 
simple test in my earlier email that showed that any user (with no schema of 
its own and no granted privileges) can connect to any database—and see the full 
metadata account of all its content. I'm teaching myself to live with this.

Re: Restricting user to see schema structure

2022-05-12 Thread Bryn Llewellyn

> neerajmr12...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> What exactly do you mean by "have created a new user and granted connection 
>> access to database"? As I understand it, there's no such thing. I mentioned 
>> a simple test in my earlier email that showed that any user (with no schema 
>> of its own and no granted privileges) can connect to any database—and see 
>> the full metadata account of all its content. I'm teaching myself to live 
>> with this.
> 
> What I meant by 'created a new user' is that I have used the following 
> commands.
> 
> CREATE USER  WITH ENCRYPTED PASSWORD '';
> GRANT CONNECT ON DATABASE  TO ;
> GRANT USAGE ON SCHEMA  TO ; 

Ah… there's obviously something I don't understand here. I've never used "grant 
connect on database"—and not experience an ensuing problem. I just tried this:

\c postgres postgres
create user joe login password 'joe';
revoke connect on database postgres from joe;
\c postgres joe

It all ran without error. (I've turned off the password challenge in my MacBook 
PG cluster.) I don't have a mental model that accommodates this. And a quick 
skim for this variant in the "GRANT" section of the PG doc didn't (immediately) 
help me. I obviously need to do more study. I'll shut up until I have.

Re: Deferred constraint trigger semantics

2022-05-12 Thread Bryn Llewellyn

> laurenz.a...@cybertec.at wrote:
> 
> Be "seeing" I didn't mean "show to the user". I mean that the code that 
> implements PostgreSQL constraints takes uncommitted data into account.
> 
> The documentation describes that for the case of uniqueness in some detail:
> 
> https://www.postgresql.org/docs/current/index-unique-checks.html
> 
> I'd say that the proper mental model is that you don't need to care. The ACID 
> properties are guarantees that the database makes, and these guarantees are 
> usually negative: "no sequence of actions can result in the violation of a 
> unique constraint" or "you don't get to see uncommitted data".
> 
> The exact sequence of what happens during COMMIT is interesting, but 
> irrelevant to the programmer.  All that counts is "a deferred constraint is 
> checked between the time that COMMIT starts processing and the time that it 
> returns".
> 
> If you want to know more, you have to start reading the code. It is open 
> source and well documented.

Thanks for the quick reply, Laurenz. I'm signing off now (US/Pacific) for a 
long weekend. I'll try to digest what you wrote on Monday.

Re: Deferred constraint trigger semantics

2022-05-16 Thread Bryn Llewellyn

> laurenz.albe@cybertec.atwrote:
> 
>> b...@yugabyte.com wrote:
> 
>> 
>> …I tried this:
>> 
>> create table t(
>>   k serial primary key,
>>   v int not null,
>>   constraint t_v_unq unique(v) initially deferred);

Here's a better test:

-- BLUE session
start transaction isolation level read committed;
insert into t(v) values (1), (2);

-- RED session
start transaction isolation level read committed;
insert into t(v) values (1), (3);

-- BLUE session
set constraints all immediate;

-- RED session (hangs until BLUE commits).
-- Then, when it does, gets ERROR... "Key (v)=(1) already exists"
set constraints all immediate;

-- BLUE session
commit;

-- RED session
-- There are no changes to commit 'cos they were rolled back.
commit;

-- "select k, v from t order by k" (in each session) now shows that both 
sessions meet the constraint.

>> Where, in the PG doc, can I read the account of the proper mental model for 
>> the application programmer?
> 
> [See https://www.postgresql.org/docs/current/index-unique-checks.html.]

Thanks for referring me to the account "62.5. Index Uniqueness Checks". It's in 
the section "Part VII. Internals" (…contains assorted information that might be 
of use to PostgreSQL developers). I wouldn't expect to read this because I 
don't intend to write code that might become part of PG's implementation.

> I'd say that the proper mental model is that you don't need to care… The 
> exact sequence of what happens during COMMIT is interesting, but irrelevant 
> to the programmer. All that counts is "a deferred constraint is checked 
> between the time that COMMIT starts processing and the time that it returns".

Yes, I very much like this stance. It seems that, for built-in constraints 
(like "unique" or "foreign key") it's enough to understand that PG implements 
these at the "read committed" isolation level by using methods (that aren't 
exposed via SQL) to peep below the application programmer's MVCC view of the 
world to check the uncommitted state of other, concurrent, sessions.

This explains why, in the (new) test that I used above, the conflict is 
detected when the second session issues "set constraints all immediate" after 
the first already did this (i.e. long before COMMIT). In this case, the second 
session hangs until the first commits—at which point the second sees the 
uniqueness violation error.

In other words, the automagic implementation of the enforcement of built-in 
constraints allows the safe use of "set constraints all immediate" to provoke a 
possible early error that can, very usefully, be handled in PL/pgSQL code. This 
is the clue to understanding why the check of a built-in constraint, when it's 
performed as an implicit consequence of "commit", doesn't need to be within the 
small part of the operations that "commit" causes that are strictly serialized. 
(You've explained how this helps performance in multi-session scenarios.)

Critically, the special methods that implement the enforcement of built-in 
constraints aren't accessible in PL/pgSQL code and therefore not accessible in 
the "ordinary" implementation of trigger functions. This is the point that I 
failed to grasp. (Though I do see, now, that Laurenz's post says this clearly.)

I was able to demonstrate this by implementing a unique constraint with a 
deferred constraint trigger (and no use of "set constraints all immediate"). I 
simply introduced "pg_sleep(5)" between the trigger function's actual check and 
its final "return null". I copied the code below for completeness.

The "BLUE" session, because it reaches its serialized "commit" actions first, 
sees an outcome that meets the constraint. But the "RED" session has enough 
time to do its check before "BLUE" does its serialized "commit" actions. So its 
test passes too. This leaves the final database in conflict with the intended 
constraint.

I see now that the only robust use of an ordinarily (i.e. not using C) 
implemented constraint trigger (deferred or otherwise) is to enforce a single 
row-constraint. (There's a caveat that maybe, after careful analysis, you can 
work out a cunning lockings scheme to allow the safe implementation of an 
entity-level constraint without using C. But the "exactly one or two admins in 
a department" example shows that this isn't generally possible.) So it's 
reasonable that a constraint trigger must be AFTER EACH ROW. Further, it would  
make no sense to do SQL from its implementation function because the only 
values that you might defensibly use are available simply via "old" and "new".

So all that stuff I was concerned about where the deferred constraint fires 
many times when once is enough falls away because the larger endeavor makes no 
sense. (I suppose that it might matter if you implemented the trigger function 
robustly in C.)

It does strike me that the ordinary application programmer—who reads just the 
sections "CREATE TRIGGER", "Chapter 39. Triggers", "CREATE TABLE", and "SET 
CONSTRAINTS"—will

Re: Restricting user to see schema structure

2022-05-16 Thread Bryn Llewellyn

> adrian.kla...@aklaver.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>>> neerajmr12...@gmail.com wrote:
>>> 
>>> ...
>> 
>> What exactly do you mean by "have created a new user and granted connection 
>> access to database"? As I understand it, there's no such thing. I mentioned 
>> a simple test in my earlier email that showed that any user (with no schema 
>> of its own and no granted privileges) can connect to any database—and see 
>> the full metadata account of all its content. I'm teaching myself to live 
>> with this.
> 
> Besides the REVOKE CONNECT, it is also possible to prevent connections to a 
> given database by a particular user by using settings in pg_hba.conf.

Adrian, I have the "pg_hba.conf" unmodified that came with the "brew" PG 
installation of PG Version 14.2 on my MacOS Big Sur laptop. It has just six 
non-comment lines, thus:

# TYPE  DATABASEUSERADDRESS METHOD
local   all all trust
hostall all 127.0.0.1/32trust
hostall all ::1/128 trust
local   replication all trust
hostreplication all 127.0.0.1/32trust
hostreplication all ::1/128 trust

This lines up with what "select * from pg_hba_file_rules" gets, thus:

 line_number | type  |   database| user_name |  address  | 
netmask | auth_method | options | error 
-+---+---+---+---+-+-+-+---
  89 | local | {all} | {all} |   |  
   | trust   | | 
  91 | host  | {all} | {all} | 127.0.0.1 | 255.255.255.255  
   | trust   | | 
  93 | host  | {all} | {all} | ::1   | 
::::::: | trust   | | 
  96 | local | {replication} | {all} |   |  
   | trust   | | 
  97 | host  | {replication} | {all} | 127.0.0.1 | 255.255.255.255  
   | trust   | | 
  98 | host  | {replication} | {all} | ::1   | 
::::::: | trust   | | 

I read the Current "21.1. The pg_hba.conf File" section and noted this tip:

«
To connect to a particular database, a user must not only pass the pg_hba.conf 
checks, but must have the CONNECT privilege for the database. If you wish to 
restrict which users can connect to which databases, it's usually easier to 
control this by granting/revoking CONNECT privilege than to put the rules in 
pg_hba.conf entries.
»

I'd like to do what this tip says. But the regime that I have allows any 
non-super user to connect to any database.

I just re-tested this with a brand-new user "joe"—and after doing "revoke 
connect on database postgres from joe".

I'm obviously missing critical "pg_hba.conf" line(s). But I can't see what to 
add from the section that I mentioned. There must be some keyword, like "none", 
meaning the opposite of "all" for users.

I tried this. (I don't have a database called "replication" so I removed those 
lines.)

local   postgrespostgrestrust
hostpostgrespostgres127.0.0.1/32trust
hostpostgrespostgres::1/128 trust

But that idea didn't work because, with my newly created user "joe", my "\c 
postgres joe" failed with a complaint that my "pg_hba.conf" had no entry for « 
user "joe", database "postgres" ».

I discovered (by "drop user") that « role name "none" is reserved ». So I added 
these lines:

local   postgresnonetrust
hostpostgresnone127.0.0.1/32trust
hostpostgresnone::1/128 trust

But even after "grant connect on database postgres to joe", my "\c postgres 
joe" still failed just as I described above. For sport, I tried this instead:

local   postgresjoe trust
hostpostgresjoe 127.0.0.1/32trust
hostpostgresjoe ::1/128 trust

But this goes against what the tip says. Anyway, after "revoke connect on 
database postgres from joe", my "\c postgres joe" succeeded.

I tried Googling. But all the hits that I found were about controlling which 
remote hosts can connect at all and what password security is to be used.

What must I do? And where is this described in the PG doc?

Re: Restricting user to see schema structure

2022-05-16 Thread Bryn Llewellyn

> adrian.kla...@aklaver.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>>> adrian.kla...@aklaver.com wrote:
>>> 
 b...@yugabyte.com  wrote:

> neerajmr12...@gmail.com  wrote:
> 
> ...

 What exactly do you mean by "have created a new user and granted 
 connection access to database"?
>>> 
>>> Besides the REVOKE CONNECT, it is also possible to prevent connections to a 
>>> given database by a particular user by using settings in pg_hba.conf.
>> 
>> 
>> Adrian, I have the "pg_hba.conf" unmodified that came with the "brew" PG 
>> installation of PG Version 14.2 on my MacOS Big Sur laptop. It has just six 
>> non-comment lines... [But thing don't work as I want.]
> 
> Because as mentioned previously you did not "revoke connect on database 
> postgres from public".

Right, I see the importance of this now. I now realize that when a database is 
created, CONNECT on it is automatically granted to PUBLIC. But there's no 
mention of this (or what to read to learn that this is the case) in the 
"pg_hba.conf" chapter. Nor does the section on the CREATE DATABASE statement 
mention this. How is the neophyte supposed to know about this behavior?

Another thing that confused me was the significance of the lines for the 
database "replication" in the "pg_hba.conf" that came with my installation. Add 
to this the mutually exclusive keywords "REPLICATION" and "NO REPLICATION" in 
the CREATE ROLE statement. ("These clauses determine whether a role is a 
replication role.") So this seems to be a distinct use of the word from how 
it's used in "pg_hba.conf" as the name of a database (that might well not 
exist). Strangely, the CREATE ROLE doc says that you don't need either of 
"REPLICATION" or "NO REPLICATION" but it doesn't say what the default is.

David Johnston wrote this in a separate thread:

> I don't quite know how to address your random experimentation with 
> pg_hba.conf.  None of the things you showed are surprising though - were you 
> expecting different?

My reports of my random experimentation were the email equivalent of the "think 
aloud" approach to usability testing. That paradigm has the creators of a 
system observe a new user trying to get things done (using any appropriate doc 
that's available). Sometimes, the user appears to be trying things randomly. 
Then the creators ask "why did you do that"—and they learn what faulty mental 
model the user has formed. And then they try to find out how the new user came 
to acquire that model. Often, the problem is that the doc (or the UI, when it's 
meant to me self-evident) suffers from what Steven Pinker calls the "curse of 
knowledge" in his book "The sense of style".

Anyway, with my experimentation and with the clues that you two (Adrian and 
David) have given me, I arrived that the following practice. It seems to give 
me what I want—i.e. a regime where ordinary new users that I create can operate 
without me needing to change the "pg_hba.conf" file and where they can connect 
to the one-and-only database that I intend and then perform exactly and only 
the tasks that I intend—in other words a regime that honors the principle of 
least privilege. (We've discussed the caveat that I can't prevent them from 
reading all of the metadata across all databases earlier.)

* I use this bare bones "pg_hba.conf" file.

 local   all all trust
 hostall all 127.0.0.1/32trust
 hostall all ::1/128 trust

* I say "\c postgres postgres" and use a script to strip the cluster done to 
its bare minimum—in my case: the users "Bllewell" and postgres; and the 
databases postgres, template0, and template1.

I say "revoke connect on database postgres from public" and "drop schema if 
exists public". (And I drop any other schemas that might have been created in 
the database postgres).

* When I create a database, I immediately drop its public schema and revoke 
connect on it from public.

* When I create a user, I say NOREPLICATION and grant it CONNECT on just the 
one database (it's always one) that I intend. However, when I create a 
superuser, I cannot prevent it from connecting to *any* database (present or 
future).

Re: Restricting user to see schema structure

2022-05-17 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Right, I see the importance of this now. I now realize that when a database 
>> is created, CONNECT on it is automatically granted to PUBLIC. But there's no 
>> mention of this (or what to read to learn that this is the case) in the 
>> "pg_hba.conf" chapter. Nor does the section on the CREATE DATABASE statement 
>> mention this. How is the neophyte supposed to know about this behavior?
> 
> By reading the documentation, specifically the chapter linked before, where 
> this is discussed.
> 
> https://www.postgresql.org/docs/current/ddl-priv.html
> 
> Not saying there isn't room for improvement here, I tend to agree that the 
> SQL Command Reference section should be considered a primary jumping off 
> point.  But the chapter on Data Definition is basically mandatory reading 
> once a DBA wants to do any non-trivial modifications to their cluster.

Thanks again, David. Yes... the section "5.7. Privileges" is essential reading. 
And I do see what I'd missed now. I broke it into bullets to make it (very 
much) easier for me at least to parse.

«
PostgreSQL grants privileges on some types of objects to PUBLIC by default when 
the objects are created.

* No privileges are granted to PUBLIC by default on tables, table columns, 
sequences, foreign data wrappers, foreign servers, large objects, schemas, or 
tablespaces.

For other types of objects, the default privileges granted to PUBLIC are as 
follows:

* CONNECT and TEMPORARY (create temporary tables) privileges for databases;

* EXECUTE privilege for functions and procedures;

* USAGE privilege for languages and data types (including domains).

The object owner can, of course, REVOKE both default and expressly granted 
privileges. (For maximum security, issue the REVOKE in the same transaction 
that creates the object; then there is no window in which another user can use 
the object.) Also, these default privilege settings can be overridden using the 
ALTER DEFAULT PRIVILEGES command.
»

(I'd already been burned by the fact that EXECUTE is granted to PUBLIC on 
newly-created functions and procedures.)

The paragraph describes very surprising behavior in the present era of "secure 
by default". The sentence "For maximum security..." at the end emphasizes this 
and has you go to some effort (CREATE and REVOKE in the same txn) to undo the 
"insecurity by default" paradigm. I s'pose that compatibility on upgrade means 
that nothing can change here.

Might it be possible to give the paragraph more prominence (like make it a note 
and start it with "WARNING" in large letters). And to x-ref it from the CREATE 
DATABASE section? And from the "pg_hba.conf" section? And correspondingly from 
the CREATE accounts for the objects of the other types?

Re: Restricting user to see schema structure

2022-05-17 Thread Bryn Llewellyn

> adrian.kla...@aklaver.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> The paragraph describes very surprising behavior in the present era of 
>> "secure by default". The sentence "For maximum security..." at the end 
>> emphasizes this and has you go to some effort (CREATE and REVOKE in the same 
>> txn) to undo the "insecurity by default" paradigm. I s'pose that 
>> compatibility on upgrade means that nothing can change here.
> 
> There is movement on this front coming in Postgres 15:
> 
> https://www.postgresql.org/docs/devel/release-15.html

Do you mean that, for example, "create database x" will no longer imply "grant 
connect on database x to public" and "create function f()" will no longer imply 
"grant execute on function f() to public"? That would be good. But I can't find 
wording to that effect on the page.

"Join Postgres on Slack" ?

2022-05-25 Thread Bryn Llewellyn

Forgive me if this is the wrong address.

I went to https://postgresteam.slack.com/join/signup 
 but found that my email address 
has an unknown domain. It says "Don’t have an email address from one of those 
domains? Contact the workspace administrator at Postgres for an invitation." 
The word "Postgres" is bolded. But it isn't a link.

What must I do to join?

Re: "Join Postgres on Slack" ?

2022-05-25 Thread Bryn Llewellyn

david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> …I went to https://postgresteam.slack.com/join/signup but found that my 
>> email address has an unknown domain. It says "Don’t have an email address 
>> from one of those domains? Contact the workspace administrator at Postgres 
>> for an invitation." The word "Postgres" is bolded. But it isn't a link. What 
>> must I do to join?
> 
> There is a web form for getting invited.
> 
> https://www.postgresql.org/community/
> https://postgres-slack.herokuapp.com/ 

Thanks, David. That was easy—when you know how!

Re: Is it possible to index "deep" into a JSONB column?

2022-05-29 Thread Bryn Llewellyn

> shaheedha...@gmail.com wrote:
> 
> Suppose I have a JSONB field called "snapshot". I can create a GIN
> index on it like this:
> 
>  create index idx1 on mytable using gin (snapshot);
> 
> In principle, I believe this allows index-assisted access to keys and
> values nested in arrays and inner objects but in practice, it seems
> the planner "often" decides to ignore the index in favour of a table
> scan. (As discussed elsewhere, this is influenced by the number of
> rows, and possibly other criteria too).
> 
> Now, I know it is possible to index inner objects, so that is snapshot
> looks like this:
> 
> {
>"stuff": {},
>"more other stuff": {},
>"employee": {
> "1234": {"date_of_birth": "1970-01-01"},
> "56B789": {"date_of_birth": "1971-02-02"},
>}
> }
> 
> I can say:
> 
>  create index idx2 on mytable using gin ((snapshot -> 'employee'));
> 
> But what is the syntax to index only on date_of_birth? I assume a
> btree would work since it is a primitive value, but WHAT GOES HERE in
> this:
> 
>  create index idx3 on mytable using btree ((snapshot ->'employee' ->
> WHAT GOES HERE -> 'date_of_birth'));
> 
> I believe an asterisk "*" would work if 'employee' was an array, but
> here it is  nested object with keys. If it helps, the keys are
> invariably numbers (in quoted string form, as per JSON).

Try this:

   snapshot -> ‘employee’->>’date_of_birth’

'{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread Bryn Llewellyn

Here’s the minimal testcase:

do $body$
declare
  j1  constant jsonb   not null := '{"x": 42, "y": null}';
  j2  constant jsonb   not null := '{"x": 42   }';

  predicate_1 constant boolean not null := (j1->>'y' is null) AND (j2->>'y' is 
null);
  predicate_2 constant boolean not null := j1 = j2;
begin
  assert predicate_1;
  assert not predicate_2;
end;
$body$;

The block finishes silently.

I certainly expect "predicate_1" to be true. This reflects the defined JSON 
semantics that, within an object, the omission of a key-value pair is the same 
as its presence with a value equal to the (bizarre) JSON null.

As I reason it, the truth of "predicate_1" implies the truth of "predicate_2" 
because "jsonb" claims to represent the underlying semantics of a JSON document 
using its own secret post-parse representation.

Am I missing a use case where an object with a key-value pair with a JSON null 
value is meaningfully different from one where the key is simply absent?

If not, would you consider what I showed to be evidence of a bug?

My discovery let me design other tests.

This block confirms the basic idea that the meaning of a "jsonb" value is 
independent of the formatting of the incoming document that defined it.

do $body$
declare
  t1 constant text not null := '
{
  "x": 42,
  "y": 17
}
';
  j1 constant jsonb not null := t1::jsonb;
  t2 constant text  not null := j1::text;
  j2 constant jsonb not null := t2::jsonb;

  predicate_3 constant boolean not null := t2 = t1;
  predicate_4 constant boolean not null := j2 = j1;
begin
  assert not predicate_3;
  assert predicate_4;
end;
$body$;

Another test (I won't bore you with its code) confirms that the order in which 
the incoming document lists key-value pairs has no consequence for its meaning.

Here's another twist on the same basic issue:

create type t_ as (a int, b int);
create function f()
  returns table(z text)
  language plpgsql
as $body$
declare
  j1_in  constant jsonb not null := '{"a": 17, "b": null}';
  j2_in  constant jsonb not null := '{"a": 17   }';

  r1 constant t_not null := jsonb_populate_record(null::t_, j1_in);
  r2 constant t_not null := jsonb_populate_record(null::t_, j2_in);

  j1_out constant jsonb not null := to_jsonb(r1);
  j2_out constant jsonb not null := to_jsonb(r1);
begin
  z := j1_out::text;  return next;
  z := j2_out::text;  return next;
end;
$body$;
select f();

This is the result:

{"a": 17, "b": null}
{"a": 17, "b": null}

I have a little demo where I shred a set of "books" incoming JSON documents 
(where a book has a genre and many authors) into the classic Codd-and-Date four 
tables: books, authors, genres, and book_author_intersections. Then I scrunch 
each book back to a single JSON doc. I want to prove that I get back what I 
started with. So I have to clutter what should be a simple test:

  differ constant boolean not null :=
(
with
  a as (select * from j_in  except select * from j_out),
  b as (select * from j_out except select * from j_in )

select (exists(select 1 from a) or exists(select 1 from b))
);

with no end of ad-hoc-ery.

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-03 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> b...@yugabyte.com wrote:
> 
>> declare
>>   j1  constant jsonb   not null := '{"x": 42, "y": null}';
>>   j2  constant jsonb   not null := '{"x": 42   }';
>> ... 
>> (j1->>'y' is null)
>  
> This produces a JSON Null which when asked for as a text data type results 
> into SQL NULL due to implicit type coercion
> 
>> (j2->>'y' is null)
> 
> This produces "key y not present in JSON" but someone decided that was too 
> unfriendly and so we instead produce SQL NULL. This SQL NULL exists for a 
> different reason than the SQL NULL in the previous case.  Unfortunately, 
> there is only a single concept of NULL in the SQL Language.
> 
>>   predicate_2 constant boolean not null := j1 = j2;
> 
> This is trivially false, j1 has a key of "y" while j2 does not. If there is a 
> complaint to be had, this returning false isn't it.
> 
>> Another test (I won't bore you with its code) confirms that the order in 
>> which the incoming document lists key-value pairs has no consequence for its 
>> meaning.
> 
> Yes, the equality test is semantic, do all the same keys and corresponding 
> values exist?  If so, return true.
> 
>>   j1_in  constant jsonb not null := '{"a": 17, "b": null}';
>>   j2_in  constant jsonb not null := '{"a": 17   }';
>> 
>> This is the result:
>> 
>> {"a": 17, "b": null}
>> {"a": 17, "b": null}
> 
> Yep, for basically the same reason as the first point.  The nulls are 
> consequences of different situations (lack of key, value of key being null) 
> being largely undistinguishable at the SQL level. We provide a "does key 
> exists" operator if you do need to make that determination.
> 
>> I have a little demo where I shred a set of "books" incoming JSON documents 
>> (where a book has a genre and many authors) into the classic Codd-and-Date 
>> four tables: books, authors, genres, and book_author_intersections. Then I 
>> scrunch each book back to a single JSON doc. I want to prove that I get back 
>> what I started with.
> 
> Yeah, the lack of any goal of round tripping conversions from JSON through 
> SQL and back into SQL makes proving that the system does such a thing 
> problematic. You'll get a back something meaningfully equivalent, by your own 
> argument, but not identical on a key-by-key basis.

Thanks for attempting to satisfy me, David—but I'm sad to say that I remain 
somewhat unsatisfied. This is doubtless my fault. I suppose that I knew, before 
I asked, that the ultimate answer would be "It is what it is and it will never 
change."

Of course, I can predict the outcomes of countless tests on the basis that I 
understand "what it is". Here's just one such (just like you suggested):

do $body$
declare
  j1 constant jsonb not null := '{"a": 1, "b": null}';
  j2 constant jsonb not null := '{"a": 1   }';

  n1 constant int := (select count(*) from jsonb_object_keys(j1));
  n2 constant int := (select count(*) from jsonb_object_keys(j2));
begin
  assert (j1 ? 'b');
  assert not (j2 ? 'b');
  assert (n1 = 2) and (n2 = 1);
end;
$body$;

The idea of "null" is a strange beast just within PostgreSQL—let alone all SQL 
systems. The classic understanding of its semantics is "There is simply no 
information available about the value". And this explains the treacherous 
three-by-three outcome table for operators like OR and AND. However, this 
understanding can't cope with the outcome here:

create type t_ as (a int, b int);
do $body$
declare
  b1 constant boolean := null;
  b2 constant boolean := null;
  r1 constant t_ not null := (b1, b2);
  r2 constant t_ not null := (b1, b2);
begin
  assert (b1 = b2) is null;
  assert (b1 is not distinct from b2);

  assert (r1 = r2);
end;
$body$;

I'd've thought that I'd need to weigh in with "is not distinct from" to get 
"r1" and "r2" to compare as the same just as I do with "b1" and "b2". So maybe 
it's not surprising that "null" in JSON is quirky too.

I'm going to try to think like this:

The number of possible spellings of the names of keys in a JSON object is some 
flavor of infinite. So including this in an object:

"k": null

really is saying something. It says that I do know about "k" and that yet I 
have simply no information available about its value.

The quirkiness that my first example showed can be seen differently from how I 
saw it first. When I now consider this expression:

('{"x": 42}'::jsonb)->>'y'

it seems that its evaluation should simply raise an exception. But you said:

> This produces "key y not present in JSON" but someone decided that was too 
> unfriendly and so we instead produce SQL NULL.

Oh well, I know how to program the cases that I care about to get the outcomes 
that I want. It just means lots of typing. But that's anyway what one signs up 
for who decides to work with JSON…

"A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

2022-06-12 Thread Bryn Llewellyn

Does the “Tip” call-out box, from which the “Subject” here is copied, and the 
larger story that I copied below, apply even when the executable section of the 
block statement in question does nothing at all that could be rolled back?

This is my block:

begin
  year_as_int := year_as_text;
exception
  when invalid_text_representation then
bad_integer := true;
end;

The typecast to integer will cause an error if the input text does not 
represent an integer. I need to detect this, when I get such a text value, and 
then go on to do something appropriate after the block. My tests show that I 
get exactly the behavior that I want. It would be tedious to program the test 
(presumably by typecasting to a numeric value and comparing that with the 
result of rounding it) to avoid the "significant" performance penalty that the 
doc describes.

It seems to me that the PG implementation has, in a block that has an exception 
section, an opportunity to wait to start a subtraction until it first hits a 
statement whose effect could be rolled back—and, of course, not do this if it 
doesn't hit such a statement. I see that this would require first preparing the 
statement and only then, if it needs it, starting its subtransaction. But this 
sounds doable.


The current PL/pgSQL doc, in this section:

43.6.8. Trapping Errors
https://www.postgresql.org/docs/current/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING
 

says this:

> When an error is caught by an EXCEPTION clause, the local variables of the 
> PL/pgSQL function remain as they were when the error occurred, but all 
> changes to persistent database state within the block are rolled back. 
> …
> A block containing an EXCEPTION clause is significantly more expensive to 
> enter and exit than a block without one. Therefore, don't use EXCEPTION 
> without need.

and then on Stack Overflow, here
https://stackoverflow.com/questions/46409856/why-is-catching-errors-inside-a-loop-causing-performance-issues
 


Tom Lane said:

> Yeah, it's about the overhead of setting up and ending a subtransaction. 
> That's a fairly expensive mechanism, but we don't have anything cheaper that 
> is able to recover from arbitrary errors.

and David G. Johnston said:

> Setting up the pl/pgsql execution layer to trap "arbitrary SQL-layer 
> exceptions" is fairly expensive. Even if the user specifies specific errors 
> the error handling mechanism in pl/pgsql is code for generic (arbitrary) 
> errors being given to it.

Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

2022-06-13 Thread Bryn Llewellyn

> pavel.steh...@gmail.com wrote:
> 
>> b...@yugabyte.com napsal:
>> 
>> Does the “Tip” call-out box, from which the “Subject” here is copied, and 
>> the larger story that I copied below, apply even when the executable section 
>> of the block statement in question does nothing at all that could be rolled 
>> back?
>> 
>> This is my block:
>> 
>> begin
>>   year_as_int := year_as_text;
>> exception
>>   when invalid_text_representation then
>> bad_integer := true;
>> end;
>> 
>> The typecast to integer will cause an error if the input text does not 
>> represent an integer.
> 
> The block is a PLpgSQL statement (internally it is not just syntax) - and a 
> safepoint is created before execution of any statement inside the block, when 
> the block has EXCEPTION parts. PL/pgSQL is an AST interpreter. It knows 
> nothing about statements inside the block. The lazy implementation 
> theoretically can be possible, but why? Any statement can raise an exception. 
> There is not any evidence what statements are safe and what not.

Thanks for the quick response. I'll take this to mean that the present behavior 
will never change—in spite of your:

> The lazy implementation [that Bryn sketched] theoretically can be possible

You then said:

> but why?

OK, so I'm obliged to answer.

Because SQL rests on the principle that you just say *what* you want but not 
*how*. Here, I want to cast my string, which putatively represents an integer, 
to an "int" value. The text comes from the outside world, and what is meant to 
be "42017" might arrive as "42O17". Or, might arrive properly, as "42000e-03".

Consider this:

create function is_int_nn(t in text)
  returns boolean
  language plpgsql
as $body$
declare
  ok boolean not null := (t is not null);
begin
  if ok then
-- Placeholder naïve REGEXPR test.
ok := (t != '') and not (t ~ 'x');
  end if;

  if ok then
declare
  n  constant numeric not null := t;
  r  constant numeric not null := round(n);
begin
  ok := (r = n);
end;
  end if;

  return ok;
end;
$body$;

select
  (select is_int_nn(null   )::text) as test_1,
  (select is_int_nn('' )::text) as test_2,
  (select is_int_nn('42000x-04')::text) as test_3,
  (select is_int_nn('42000e-04')::text) as test_4,
  (select is_int_nn('42000e-03')::text) as test_5;

Neither the design nor the implementation of the code is by any means finished 
yet. And this holds, therefore, for the tests too. So this is a loud denial of 
« just say *what* you want ».

You might argue that any junior programmer could manage the complete exercise 
in a morning. But then somebody else has to review it. And it's another 
artifact to be managed. Generic utilities like this always present a challenge 
when they need to be used in more than one distinct application. You need to 
invent a "single source of truth" scheme.

Compare everything that "function is_int_nn(t in text)" implies with the block 
that I showed above.

Oracle Database 12c Release 2 (and later) has a validate_conversion() built-in.

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/VALIDATE_CONVERSION.html#GUID-DC485EEB-CB6D-42EF-97AA-4487884CB2CD

This arrived in (some time around) 2017.

Is there any chance that PG might ship an equivalent?

Re: "A block containing an EXCEPTION clause is significantly more expensive to enter and exit than a block without one"

2022-06-13 Thread Bryn Llewellyn

> x...@thebuild.com wrote:
> 
> Isn't one of the reasons for the savepoint (in fact, the principal reason) to 
> reset the connection back to non-error state so that execution can continue? 
> In that case, it really does need to create the savepoint at the start of the 
> block, regardless of what's in it, since any statement can raise an error.

> t...@sss.pgh.pa.us wrote:
> 
> It also rests on the principle that the programmer shouldn't be too concerned 
> about micro-efficiencies. You've given a perfectly good six-line 
> implementation of what you want; use it and be happy.
> 
> ...
> 
> [ shrug... ] We are not Oracle. One of the main ways in which we are not 
> Oracle is that we support extensible database functionality. To write a 
> "validate_conversion" function that supports extension datatypes, but doesn't 
> use something morally equivalent to a subtransaction, would be a nightmare: 
> large, fragile, and probably not all that much faster.

> pavel.steh...@gmail.com wrote:
> 
> I remember a long and still not closed discussion about fault tolerant copy 
> implementation. The problem is a lot of possibly redundant code for exception 
> safe input functions, if I remember well. And it is not enough for fault 
> tolerant copy still. Maybe it needs some refactoring of the PostgreSQL 
> exceptions handling system to be able to handle some exceptions that come 
> from a non-storage engine without the necessity to use safepoints. I have no 
> idea if somebody is working on this issue now, but I don't expect so it is 
> easy to fix it. Maybe a more probable fix can be to reduce an overhead of 
> savepoints. This issue is more complex than can be visible from user 
> perspective - and the complexity is based on how pg has implemented 
> exceptions.
——

Thanks, Tom Lane, for your reply. The note in the doc:

> A block containing an EXCEPTION clause is significantly more expensive to 
> enter and exit than a block without one. 

uses the word "significantly". This scares the application programmer. Would 
you (all) consider this revised wording:

« If the executable section of a block that has an exception section makes 
changes to the database, then this brings significantly more expense than when 
there is no exception section. However, if the executable section makes only 
changes to local variables or session parameters, then the additional expense 
brought by an exception section is negligible. »

Oracle Database users had to wait about 40 years for the 
"validate_conversion()" built-in—despite strong and persistent user-pressure. 
PostreSQL is about 25 years old. So there's plenty of time to reconsider...
——

Thanks, Pavel Stehule, for your reply. Forget savepoints and blocks with 
exception sections. And forget the general case that Oracle's 
"validate_conversion()" handles.

It would seem to me that the code that implements "year_as_int := year_as_text" 
and, maybe, says "text into int won't go", is distinct from the code that it 
notifies to raise an error. It ought to be possible to factor out this code for 
re-use and use it both to do what it presently does and to implement a built-in 
"is_int_nn(t in text)". And maybe the same for numeric and for timestamp[tz]. 
These must be the common cases when such putative values come in from the UI. 
Proper practice insists on re-validating the conversions in the data base even 
though it's to be hoped that the UI will have done this.

However, in the light of Tom's « You've given a perfectly good six-line 
implementation of what you want; use it and be happy. », the need (almost) 
vanishes—except for the point that I mentioned earlier about packaging up the 
test in a the application development show's "utilities" kit.
——

Thanks, Christophe Pettus, for your reply.

> it really does need to create the savepoint at the start of the block, 
> regardless of what's in it, since any statement can raise an error.

This is necessary only when it can't be proved that the executable section that 
precedes the exception section cannot make database changes—hereinafter "is 
safe". Such safety tests are always over cautious. So, for example, it would be 
assumed that any transfer of control out of the executable section, and then 
back, was unsafe.

However, the experts have insisted that even the safety of this case cannot be 
proved:

> an executable section with only simple assignments that use only built-in 
> functionality.

I suppose that "PL/pgSQL is an AST interpreter" is the clue here.

Given this, then yes, you're right.

ERROR: failed to find conversion function from key_vals_nn to record[]

2022-06-15 Thread Bryn Llewellyn

I’ve copied a self-contained testcase below. Is the error that the "as 
intended" test causes due to a known limitation—or even a semantic dilemma that 
I'm failing to spot? Or might it be due to a bug?

If you're interested, the testcase rests on this thinking:

Many SQL experts claim that the RDBMS pioneers made a mistake when they made 
data types nullable by default—and they argue that the optional modifier should 
have been *nullable*. Anyway, the philosophy is clear:

«
Nulls bring all sorts of risks to the programmer that are trivially avoided in 
most cases where you don’t anyway want nulls.
»

However, it’s a pain to have to remember to write "not null" in a gazillion 
places. And the user-defined domain (conspicuous by its absence in Oracle 
Database) is the perfect device to impose your intentions from a single point 
of maintenance.

I've gone to the max. with the "nn" domain approach in my testcase. It uses a 
composite type thus:

  type key_val as (k text_nn, v text_nn);

(At least the "text_nn" idea doesn't cause problems.)

It uses the "any()" array function to test if a given "key_val" value is found 
in an array of such values.

The error that occurs when I write what I mean, using a "key_val_nn" value and 
a "key_vals_nn" value.

I can work around the problem by typecasting my values back to their base 
"key_val" and "key_val[]" values by hand.

So I'm surprised that PG can't manage this typecasting for itself.

——

create domain text_nn as text not null;
create type key_val as (k text_nn, v text_nn);

create domain key_val_nn  as key_val not null;
create domain key_vals_nn as key_val_nn[] not null;

create function f(which in text)
  returns text
  language plpgsql
as $body$
declare
  -- Use the raw composite type.
  kv1 constant key_val := ('a', 'b');
  kv2 constant key_val := ('a', 'b');
  kvs constant key_val[]   := array[kv1, kv2];

  -- Use the domains that bring null-safety.
  kv1_nn  constant key_val_nn  := ('a', 'b');
  kvs_nn  constant key_vals_nn := array[kv1, kv2];

  -- Typecast the null-safe values back to the raw composite type.
  kv1_0   constant key_val := kv1_nn;
  kvs_0   constant key_val[]   := kvs_nn;
begin
  case which
when 'without NOT NULL safety' then
  return (kv1 = any(kvs));
when 'as intended' then
  return (kv1_nn = any(kvs_nn));
when 'workaround' then
  return (kv1_0 = any(kvs_0));
  end case;
end;
$body$;

select f('without NOT NULL safety');
select f('workaround');

/*
  This one cases the error, thus:

  ERROR:  failed to find conversion function from key_vals_nn to record[]
  CONTEXT:  SQL expression "(kv1_nn = any(kvs_nn))"
*/;
select f('as intended');

Re: ERROR: failed to find conversion function from key_vals_nn to record[]

2022-06-15 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> I’ve copied a self-contained testcase below. Is the error that the "as 
>> intended" test causes due to a known limitation—or even a semantic dilemma 
>> that I'm failing to spot? Or might it be due to a bug?
> 
> I read the note in create domain as basically “don’t do this” (the not null 
> part) but the issue you are pointing out seems unrelated to that. 
> 
>> /*
>>   This one cases the error, thus:
>> 
>>   ERROR:  failed to find conversion function from key_vals_nn to record[]
>>   CONTEXT:  SQL expression "(kv1_nn = any(kvs_nn))"
>> */;
>> select f('as intended');
> 
> The fact that a domain over an array isn’t being seen as an array here seems 
> like a bug.  POLA violation at least, and I don’t recall any notes regarding 
> this dynamic in the docs.
> 
> However, a more trivial case does work, at least in HEAD:
> 
> create domain mytext as text[] not null;
> select '1' = any(array['1','2']::mytext);
>  ?column?
> --
>  t
> 
> However, as you show:
> 
> create type kv AS ( key text, val text );
> create domain kvarr as kv[];
> select ('1','one')::kv = any (array[('1','one')::kv]);
>  ?column?
> --
>  t
> (1 row)
> 
> select ('1','one')::kv = any ((array[('1','one')::kv])::kvarr);
> ERROR:  failed to find conversion function from kvarr to record[]
> 
> So the interaction of a composite type and the domain over array seems to be 
> the scope of the issue - which makes me thing bug even more.

Thanks for the quick response, David. I'll use my workaround for now. And I'll 
look out for news about a possible fix.

About this from the "create domain" doc:

«
Best practice therefore is to design a domain's constraints so that a null 
value is allowed, and then to apply column NOT NULL constraints to columns of 
the domain type as needed, rather than directly to the domain type.
»

What an enormous disappointment. It defeats a large part of what I wanted to 
adopt as a practice. As I understand that “don’t do this” caution, and the 
discussion that surrounds it, the advice applies only to the case that a domain 
with a not null constraint is used as the data type of a column in a table. I 
tried this variant on what the doc has:

create domain text_nn as text not null;
create table t1(k int primary key, v text not null);
insert into t1(k, v) values(1, 'x');

-- Causes:
-- null value in column "v" of relation "t1" violates not-null constraint
insert into t1(k, v) values(2, (select 'y'::text_nn where false));

Right, the subquery evaluates to "null". Then failure comes, as is intended, 
when the attempt is made to assign "null" to "t.v" in the to-be-inserted row.

Then I repeated the test like this:

create table t2(k int primary key, v text_nn);
insert into t2(k, v) values(1, 'x');
insert into t2(k, v) values(2, (select 'y'::text_nn where false));

\pset null ''
select
  k,
  v,
  pg_typeof(v) as "pg_typeof(v)"
from t2;

No error—and this result:

 k |   v| pg_typeof(v) 
---++--
 1 | x  | text_nn
 2 |  | text_nn

This is what the doc promises. But how can you see it as anything but a bug? 
The subquery evaluates to "null", and only then is the attempt made to create a 
new row which self-evidently violates the domain's constraint. How is it any 
different from this:

insert into t2(k, v) values(1, null);

This obligingly causes "domain text_nn does not allow null values".

Re: Operators on ranges with a domain subtype do not implicitly cast operands of the domain's base type

2022-06-16 Thread Bryn Llewellyn

> dpopow...@artandlogic.com  wrote:
> 
> I'm using PG 14 and have an application using a custom range with a custom 
> domain subtype. My problem: PG does not do an implicit cast from the domain's 
> base type when used with range operators.

I hit what looks to be the same issue. I reported in here:

https://www.postgresql.org/message-id/37d76918-6fd7-4598-a409-a7091687d...@yugabyte.com

David Johnston said that he thought that it was a bug.

My workaround is to typecast the two operands explicitly back to their base 
types.

I don't think that there's any magic to fix it declaratively. My guess is that 
you might work around it with a user-defined operator for the domains in 
question that hides the typecasts in its implementation function. (This has 
worked for me in other cases for other reasons. But I didn't try that in my 
testcase.)

Re: ERROR: failed to find conversion function from key_vals_nn to record[]

2022-06-16 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> This is what the doc promises. But how can you see it as anything but a bug? 
>> The subquery evaluates to "null", and only then is the attempt made to 
>> create a new row which self-evidently violates the domain's constraint. How 
>> is it any different from this:
> 
> Because NULL is doing double-duty. The absence of any possible value is 
> represented by null, as is knowing that a value exists but not knowing what 
> that value is. The most obvious example of this problem is...
> 
> ...The resultant value of "b.lbl" is null in both queries, but not for the 
> same reason (left join missing semantics pre-empt null type value semantics).
> 
> So, yes, queries can produce NULL even for domains that are defined not null.
> 
> If you want protection from null values in your database tables you must 
> define your columns to not accept nulls. It becomes a habit and can be 
> readily checked for in the catalogs (I suggest adding a column comment for 
> why columns defined as null are allowed to be null, then you can query the 
> column contents to exclude those columns where null was intended - or go with 
> your idea and just report every column as non-conforming. COMMENT ON 
> table.column IS '@NULLABLE - optional information the customer might not 
> provide').

First off, thank you very much for the tip to use "comment on" together with 
catalog queries to police a practice policy. Meanwhile, I've noted the 
unsurprising fact that you can write a "not null" constraint in the "create 
table" statement that defines a column's data type using a domain that, too, 
has such a constraint.

Maybe a slightly more lightweight practice could be to use domains like 
"text_nn" and "text_uc" (for unconstrained text) and then in "create table" 
statements and similar add the explicit constraint for the "_nn" case. Using 
"text_uc" and similar would support searching the catalog, too. And both 
approaches are vulnerable to ordinary human error that would be very hard to 
detect. Typing an explanation in any kind of comment, including external prose 
doc, always has this character.

Back to NULLs...

Your code examples ran without error and produced the results that you 
described. I do understand the fact that, on its face,  the NULLs in the two 
cases arise for different reasons. But this (still) seems to me to be a 
distinction without a difference. It rather reminds me of my earlier discussion 
with you (all) about the distinction (in the world of JSON using "jsonb") 
between the presence of an object key "k" with the value "JSON null" and the 
absence of key "k".

The semantic proposition behind the "outer join", as it seems to me, is 
inextricably bound up with the notion that, in the resulting rows, one table 
might not have a partner row with the other. (It doesn't matter here which 
table lacks the partner or if you decide to spell your query so that "right" is 
the appropriate choice or "left" is—as long as you spell the whole thing 
correctly to express your intention.) And the "outer join" semantics bring the 
notion that you simply have no information about the facts that, were it 
present, the missing row might have given you. Whoever it was on the Committee 
back in the day, decided in concert to represent this "no information" outcome, 
in the relation that results for an "outer join", as an "emergent" SQL NULL.

I've talked endlessly about NULL, over the years and face-to-face, with 
colleagues whose reasoning ability and lucidity I hugely respect. They are 
unwavering in how they explain NULL. It says simply: "I have absolutely no 
information about the value that I sought to interrogate." And these experts 
argue that there are no flavors of the bare fact of having no information. They 
argue, too, that to say "the value of this variable (or row-column 
intersection) is NULL" is an oxymoron because the absence of information is not 
a value—in the purist sense.

I copied a reasonably realistic example, below, where the "outer join" is 
between two physical tables, "genres" and "books". They have properly defined 
PKs. And there's an FK that points in the direction that the table names 
suggest. Here's the requirement:

«
Each book may be labelled by exactly one (known) genre.
Each genre may label one or several books.
»

It's the canonical list-of-values use case. It's perfectly OK to have a book 
for whose genre there is simply (as yet) no information available. And it's 
also perfectly OK to have a genre that, as yet, labels no book.

The PK columns inevitably end up as "not null". And the FK column is 
deliberately nullable.

I avoided domains altogether, using explicit "not null" constraints, or not, as 
appropriate to the requirements set out above.

Notwithstanding the proper PK and FK declarations, I cannot stop the "genres" 
table having a row that has no "books" partner. And this is exactly what I want.

My example crea

Re: ERROR: failed to find conversion function from key_vals_nn to record[]

2022-06-16 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Can anybody show me an implementation of a realistic use case that follows 
>> proper practice — like "every table must a primary key", "a foreign key must 
>> refer to a primary key", and "joins may be made only "on" columns one of 
>> which has a PK constraint and the other of which has a FK constraint" — 
>> where using a not nullable data type brings a problem that wouldn't occur if 
>> the column were defined with a nullable data type and an explicit "not null" 
>> constraint?
> 
> Nothing obvious comes to mind. But frankly:
> 
> proper practice includes trying to write idiomatic code for the language you 
> are using so others familiar with the language can learn your code more 
> easily. You are violating this to an extreme degree.
> 
> I do not think it to be a good trade-off. SQL writers are practical people 
> and the idioms largely avoid any downsides that the arise from SQL not being 
> some paragon of language design.
> 
> -- "\d genres" shows "gk" with a "not null" constraint, whether I write it
> -- or not. And convention seems to say "don't clutter you code by writing it".
> create table genres(
>   gk  int   primary key,
>   gv  text  not null
>   );
> 
> "Primary Key" is defined to be the application of both UNIQUE and NOT NULL 
> constraints...

Yes, I know what "primary key" implies. I meant only to emphasize that the 
source column for what the "outer join" projects has a not null constraint, 
that it doesn't apply to the projection of that column, that this is perfectly 
understandable, and that this isn't a problem. Never mind.

What part of the code that I showed (the "genres" and "books" use case) 
violated, to an extreme degree, what you would have wanted me to write—and in 
what way?

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-17 Thread Bryn Llewellyn

> mmonc...@gmail.com  wrote:
> 
> yeah. I would expect for json or jsonb, two values, "a, b",  "a" is distinct 
> from "b" should give the same answer as "a::text is distinct from b::text".
> 
>> t...@sss.pgh.pa.us  wrote:
>> 
>>> b...@yugabyte.com  wrote:
>>> 
>>> I'm going to try to think like this: The number of possible spellings of 
>>> the names of keys in a JSON object is some flavor of infinite. So including 
>>> this in an object:
>>> 
>>> "k": null
>>> 
>>> really is saying something. It says that I do know about "k" and that yet I 
>>> have simply no information available about its value.
>> 
>> I'd read it as asserting that key "k" is meaningful for this object, but the 
>> correct value for that key is not known.
>> 
>> I have a hard time with your assertion that {"x": 42, "y": null} should be 
>> considered equivalent to {"x": 42}, because it would
>> render key-exists predicates useless. Either you have to say that key "y" is 
>> claimed to exist in both of these objects and indeed every object, or you 
>> have to make it fail if the key's value is null (so that it'd say "false" in 
>> both of these cases). Either of those options seems both weird and useless.
>> 
>>> The quirkiness that my first example showed can be seen differently from 
>>> how I saw it first. When I now consider this expression:
>>> 
>>> ('{"x": 42}'::jsonb)->>'y'
>>> 
>>> it seems that its evaluation should simply raise an exception. But you said:
>>> 
 This produces "key y not present in JSON" but someone decided that was too 
 unfriendly and so we instead produce SQL NULL.
>> 
>> Right. This is hard to justify from a purist semantic point of view, but 
>> having the operator throw an error in such cases would make it close to 
>> unusable on not-uniformly-structured data. And really the point of using 
>> JSON inside a SQL database is to cope with irregularly-structured data, so 
>> fuzziness seems like what we want.

Thank you very much for this, Tom. And thanks again to you, David, for your 
input. I hope that it's clear that the purpose of my questions is to discover 
what I'm missing—both w.r.t. actual semantics and w.r.t. the use cases that 
motivate PostgreSQL's functionality. Sorry if my questions (here and on other 
topics) might seem to challenge established wisdom and precedent.

Thank you very much for this, Tom. And thanks again to you, David, for your 
earlier input. I hope that it's clear that the purpose of my questions is to 
discover what I'm missing—both w.r.t. actual semantics and w.r.t. the use cases 
that motivate PostgreSQL's functionality. Sorry if, contrary to my intention, 
my questions (here and on other topics) might seem to challenge established 
wisdom and precedent.

I delayed my reply until I'd had time to think, to do some study, and (most 
importantly) to implement a complete, self-contained proof-of-concept to 
substantiate my conclusion. I'm ready, now, to report back.

Summary: I can meet my goal by using PG's native functionality appropriately.

So, w.r.t. this email’s subject, this:

  select strip_null_keys('{"x": 42, "y": null}'::jsonb) = '{"x": 42}'::jsonb;

returns "true".

The point at issue is whether the presence of « "some key": null » is different 
from the absence of "some key". And you (all) have argued that the two 
locutions for what I have wanted to see as one notion are indeed different. 
Well, yes, of course they are. This is tautologically true if you think only of 
the Unicode text of a JSON document that's so far waiting to be ingested by who 
knows what system. And you pointed out that, in PostgreSQL

  my_doc ? 'some key'

detects the difference. You've convinced me that some use cases will care about 
this—in other words, it's a distinction *with* a difference. But, in my use 
case, the two distinct locutions bring no semantic difference. But the actual 
distinction causes me a problem when I try to prove that this pair of 
transformations is idempotent:

  JSON → relational → JSON

But that problem is behind me now.

So... (and if you read this far at all) you can stop now unless you're 
interested to read further.

Before getting to my use case, I noted that Tom said this about the fact that 
using ->> to read a non-existent key returns a SQL null:

> [This] is hard to justify from a purist semantic point of view, but having 
> the operator throw an error in such cases would make it close to unusable on 
> not-uniformly-structured data.

It's easy to fix this, when the app requires strictness, by implementing a 
user-defined operator pair, say +>> and its partner. I did this (to respect 
what my use case needs) —and it passed all my tests. However, it turned out, 
for reasons that I explain below, that I didn't need it. The operator's 
implementation function reads the value with the appropriate native operator 
and only if it returns SQL null (or JSON null) does it do

Re: ISBN (was: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?)

2022-06-18 Thread Bryn Llewellyn

> hjp-pg...@hjp.at wrote:
> 
>> Bryn wrote:
>> 
>> "isbn" — string
>> values must be unique across the entire set of documents (in other words, it
>> defines the unique business key); values must have this pattern:
>> 
>>  « ^[0-9]{3}-[0-9]{1}-[0-9]{2}-[0-9]{6}-[0-9]{1}$ »
> 
> Completely off-topic, but this regexp doesn't describe ISBNs. In ISBNs the 
> three middle subfields are all variable length. The first is a language code 
> (there are more than 10 languages in the world), the second identifies the 
> publisher (there are more than 100 publishers) and the third the book. For 
> example, "Mastering PostgreSQL 9.6" has the ISBN 978-1-78355-535-2.

Yes, I know. Sorry. I should have said that I simply wanted to illustrate a 
proof of concept for the notion, viable (only?) when you have incoming JSON 
documents with a well-defined schema, that this is idempotent:

  JSON → relational → JSON

And it's of note that PostgreSQL has had what you need to do the xform, in each 
direction, for a long time.

I simply typed up my "corpus" by hand. It didn’t matter that the ISBN format 
was a fairly tale. A book does have an ISBN and (as far as my P.o.C. is 
concerned) it's the unique business key. I should have cut out the REGEXP in my 
email and said something like "must conform to the specified format". But I was 
careless in my proof reading.

Thanks for not mentioning that books have publishers—so that "authors" isn't 
the only key that implies a Codd-and-Date many-to-many!

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-18 Thread Bryn Llewellyn

> haram...@gmail.com wrote
> 
>> b...@yugabyte.com wrote:
>> 
>> I implemented two complementary functions:
>> 
>> —"no_null_keys()" checks that a "jsonb" value has no occurrences of « "some 
>> key": null »
>> 
>> —"strip_null_keys()" removes « "some key": null » occurrences from a "jsonb" 
>> value
>> 
>> The code checks with "no_null_keys()" that, as expected, no ingested JSON 
>> document has an occurrence of « "some key": null ».
>> 
>> And it uses "strip_null_keys()" on the output of "to_jsonb()" — and, as 
>> appropriate, any other built-in JSON function that produces a "jsonb" value.
>> 
>> It was straightforward to implement these two functions by using REGEXP 
>> built-in functionality on the canonically formatted "text" value produced by 
>> the "jsonb::text" typecast.
> 
> In my experience, using regular expressions applied to document formats tends 
> to get you false positives. I’d be worried about books with titles similar to:
> 
> «
> How we wrote a regular expression to detect occurrences of "some key": null 
> in our JSON documents
> »
> 
> For stripping those null occurrences, you are aware of the 
> json_strip_nulls(json) and jsonb_strip_nulls(jsonb) functions, right?
> 
> For detecting them on a recent PG, the @? operator or json_path_exists(json, 
> jsonpath) functions would probably do the trick.
> I am not too familiar with JSONPATH expressions, but I expect (it passed some 
> preliminary testing) this would detect your nulls just fine, while taking 
> JSON semantics into account:
> 
> jsonb_path_exists(document, '$.** ? (@ == null)'::jsonpath)
> 
> For PG-specifics on JSONPATH, see section 9.16.2: 
> https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-OP-TABLE
> 
> A recursive query is another possible solution. It would probably perform far 
> worse...

Thank you so much, Alban, for taking an interest in my « "some key": null » 
saga.

The "G", "E", "L", and "O" keys on my keyboard were broken. They're fixed now, 
and so "postgres strip keys with null value from jsonb object" got me to this 
(but, strangely, not to the PG docs):

docs.yugabyte.com/preview/api/ysql/datatypes/type_json/functions-operators/jsonb-strip-nulls/

This is embarrassing. I wrote that doc, along with the rest of the content in 
the "JSON data types and functionality" section just before COVID hit us. 
Apparently, life under lockdown has damaged my corpus callosum. I typed up my 
proof-of-concept code and emails with one half of my brain—and it was the other 
half that wrote that documentation.

So thanks for the memory jog. My excuse is that (as was the case with my 
malformed ISBN that Peter Holzer pointed out in a separate thread) I wanted 
just to show myself, as a proof-of-concept, that stripping nulls was 
feasible—so I gave it no more thought once I'd done that. But, I suppose, 
that's not excuse...

Anyway, my "strip_null_keys()" is already on the scrapheap. And the body of my 
"no_null_keys()" reduced to a single line:

create function no_null_keys(j in jsonb)
  returns boolean
  immutable
  language sql
as $body$
  select j = jsonb_strip_nulls(j);
$body$;

You might argue that I don't need to bother with the encapsulation. But it 
makes testing easier—and I'm trusting that inlining works as advertised.

Your point about false positives is well taken. So, just for sport:

create type t1 as (k int, v text);
create type t2 as (a int, b int, c t1, d t1, e text[]);

create function j()
  returns jsonb
  language plpgsql
as $body$
declare
  tconstant textnot null := 'How we wrote a regular expression to 
detect occurrences of « "some key": null » in our JSON documents!';
  c1   constant t1  not null := (17, t);
  c2   constant t1  not null := (29, null);
  arr  constant text[]  not null := array['x', null::text, t];
  rconstant t2  not null := (42, null, c1, c2, arr);
begin
  return to_jsonb(r);
end;
$body$;
select jsonb_pretty(j());

The output includes two occurrences of this:

"How we wrote a regular expression to detect occurrences of « \"some key\": 
null » in our JSON documents!"

I believe that the "jsonb" to "text" conversion never produces an isolated 
double-quote within the representation of an object key's value. I checked that 
my "strip_null_keys()" handled your example before consigning it to the 
scrapheap—and it didn't let me down. But it would be foolish to argue that 
there isn't some way to provoke a false positive.

Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

2022-06-19 Thread Bryn Llewellyn

> b...@yugabyte.com wrote:
> 
>> david.g.johns...@gmail.com  wrote:
>> 
>>> b...@yugabyte.com  wrote:
>>> 
>>> Can anybody show me an implementation of a realistic use case that follows 
>>> proper practice — like "every table must a primary key", "a foreign key 
>>> must refer to a primary key", and "joins may be made only "on" columns one 
>>> of which has a PK constraint and the other of which has a FK constraint" — 
>>> where using a not nullable data type brings a problem that wouldn't occur 
>>> if the column were defined with a nullable data type and an explicit "not 
>>> null" constraint?
>> 
>> Nothing obvious comes to mind. But frankly:
>> 
>> proper practice includes trying to write idiomatic code for the language you 
>> are using so others familiar with the language can learn your code more 
>> easily. You are violating this to an extreme degree.
>> 
>> I do not think it to be a good trade-off. SQL writers are practical people 
>> and the idioms largely avoid any downsides that the arise from SQL not being 
>> some paragon of language design.
>> 
>> -- "\d genres" shows "gk" with a "not null" constraint, whether I write it
>> -- or not. And convention seems to say "don't clutter you code by writing 
>> it".
>> create table genres(
>>   gk  int   primary key,
>>   gv  text  not null
>>   );
>> 
>> "Primary Key" is defined to be the application of both UNIQUE and NOT NULL 
>> constraints...
> 
> Yes, I know what "primary key" implies. I meant only to emphasize that the 
> source column for what the "outer join" projects has a not null constraint, 
> that it doesn't apply to the projection of that column, that this is 
> perfectly understandable, and that this isn't a problem. Never mind.

Peter, your email:

www.postgresql.org/message-id/20220618064453.wtz4hxyeptwdh37z%40hjp.at

ended up in my junk folder. (This happens often, but randomly, with emails sent 
to pgsql-general—and I can't control it.) I read it quickly. And then when I 
returned to read it more carefully I managed to delete it—irrevocably.

The discussion has wandered so far from the original topic that it seemed just 
as well to start a new thread with this.

> hjp-pg...@hjp.at wrote:
> 
> But it would be a problem if there was an actual type which wouldn't include 
> NULL.
> 
> The NOT NULL attribute is an attribute of the column, not the type. When you 
> use the primary key (or any other column marked as NOT NULL) the type of the 
> result is just the type of that column, the NOT NULL is dropped.

It seems to me that my new tescase reveals one clear bug and one outcome that 
seems to me to be a bug. Others might argue that neither is a bug. Of course, 
I'll be happy to be corrected—especially if I did some typos in my SQL or 
misinterpreted what I saw.

Thank you for that example. It stimulated me to think harder than I had before 
and to design a brand new testcase. I didn't need to refer to your example when 
I wrote what follows.

I'm using the term "view" here as a shorthand for « the results from a "select" 
». And I used real views in my testcase to make the programming easier.

I copied my self-contained testcase below so that you can simply run it "as 
is". (You'll have to add the "drop" statements that you need.)

Here's my interpretation of the testcase output:

Self-evidently, a view does *not* inherit constraints from the columns of its 
base table(s).

A view on a single table doesn't necessarily inherit the data types of its base 
table's columns. Rather, the view compilation's analysis is *sometimes* clever 
enough to notice when a projected column might have a NULL even when the base 
column doesn't allow NULLs. In this case, if the base column's data type is (in 
my example) the domain "text_nn", then the corresponding column in the view is 
given the data type plain "text". My test that uses a single table shows this.

However, the compilation's analysis for a view on a join (at least when it's a 
two-table "outer join") slavishly inherits the data types from all of the 
referenced columns—even when the human can easily predict that some projected 
columns might have NULLs.

It seems to me that this optimistic design choice was unfortunate—and that a 
pessimistic choice would have been better:

— when the analysis cannot predict the outcome, replace the data type of 
*every* column that has a "not null domain" data type with the domain's base 
data type.

But I accept that this cannot be changed now. Might it be possible (in some 
future PG release) to make the analysis clever enough to deal with the issue at 
hand (as it already does in my single-table example)?

With "insert-select", you (self-evidently) define the data types and 
constraints of the target table's columns explicitly, reflecting your analysis 
of what you expect. Of course, then, the "insert-select" m

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread Bryn Llewellyn

> jian.universal...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> /* — START OF SPEC —— */
>> 
>> The document's top-level object may use only these keys:
>> 
>> ...
>> 
>> And the meaning of *not required* is simply "no information is available for 
>> this key" (with no nuances). The spec author goes further by adding a rule: 
>> this meaning must be expressed by the absence of such a key.
>> 
>> 
>> /* — END OF SPEC  */
> 
> create temp table source(
>   ...
>   CONSTRAINT test_jsonb_constraints1 check (book_info_text is json)
>   CONSTRAINT test_jsonb_constraints2 check 
> (JSON_EXISTS(book_info_text::jsonb,'$.title') )
>   ...
>   )
> );
> 
> Some of the problems I don't know how to solve. My intuition feels like that 
> isbn attribute in the json document column then enforcing the unique 
> constraint would be anti-pattern. So I put the isbn outside as a separate 
> column. Another constraint is that there are only certain keys in the jsonb. 
> I don't know how to implement it. But I feel like it's do-able.
> two columns, one text, another generated column stored jsonb, So there is a 
> duplication issue...
> 
> So there is another alternative way to do it...

Thank you very much for your suggestions, Jian. I'll try them and think 
carefully about how everything then looks over the next week or so.

Meanwhile, I hope that it's clear to all that I have nothing more than 
prototype code on my own laptop. I can change any aspect of it in no time when 
I come to see better alternatives. (In particular, I can easily heave out my 
use of "domains with not null constraints".)

My code, as it stands, does meet the goals that I set for it (esp. by meeting 
the "JSON → relational → JSON" idempotency requirement). I also handle the 
"required", or "not required", status of the attributes of the objects (and 
only specified keys present) quite easily. But I don't see any sign of "only 
specified keys present" in your code.

I haven't tried my prototype with large volumes of synthetic data—and nor have 
I considered performance at all. That can come later. (But with my current 
implementation, my tiny end-to-end test kit completes crazily quickly.)

I probably didn't say out loud that the kinds of updates and queries that the 
four table Codd-and-Date representation suggests can be supported with fairly 
ordinary SQL and not too much thought. However (at least as it feels to me), 
the corresponding operations on the native "jsonb" representation would be 
harder to design and write. Moreover, *any* change implies updating all of the 
indexes and re-checking all of the constraints.

In other words, my aim here is to treat JSON in the way that first motivated it 
(as a data transport format) and *not* in the way that it's often used (as a 
bucket for a set of noticeably heterogeneous documents). This is where the the 
"JSON → relational → JSON" idempotency requirement comes from.

Sorry if I didn't make this clear.

Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

2022-06-19 Thread Bryn Llewellyn

> sham...@gmx.net wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Am I missing a use case where an object with a key-value pair with a JSON 
>> null value is meaningfully different from one where the key is simply absent?
> 
> It seems the JSON specification doesn't actually define equality. But the 
> JSON patch RFC 6902:
> 
> www.rfc-editor.org/rfc/rfc6902#section-4.6
> 
> defines the equality of two objects as:
> 
> «
> objects: are considered equal if they contain the same number of members, and 
> if each member can be considered equal to a member in the other object, by 
> comparing their keys (as strings) and their values (using this list of 
> type-specific rules)
> »
> 
> As {"x": 42, "y": null} and {"x": 42} do not contain the same number of 
> members, I think Postgres' behaviour is correct.

Thanks. I believe that I addressed this in a separate branch in this thread. 
The distinction (between « "some key": null » and "some key" absent), which by 
all means matters in some contexts, does not matter w.r.t. the requirements of 
my (demo) app. And I've now proved (to myself, at least) that I can meet my 
goal easily with existing Postgres features—especially "jsonb_strip_nulls()".

I hope that this means that we're all happy.

Re: Outer joins and NULLs (old subject "ERROR: failed to find conversion function from key_vals_nn to record[]")

2022-06-27 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> Allowing domains to be defined as not null at this point is simply something 
> that we don't support but don't error out upon in the interest of backward 
> compatibility. (IMO, the documentation is not this strongly worded.) It, as 
> you note, has some corner-case bugs. You can avoid those bugs by simply not 
> using a non-null constraint as suggested.
> 
> ...At most we should probably go from saying "Best practice therefore..." to 
> "We no longer support setting a not null constraint on a domain but will not 
> error in the interest of not breaking existing uses that are careful to avoid 
> the problematic corner-cases”.

Thank you all for helping me see the problem here. I’d certainly welcome 
strengthening the doc’s admonition to use wording like "unsupported", 
"unpredictable results", and "just don’t do this".

I was simply confused. There's no other way to say it. Anyway, my foolish use 
of domains with "not null" constraints hadn't left the privacy of my own 
laptop—and I've expunged all those uses now.

Here’s what I now take from that "create domain" note:

When a datum (either an actual value or a NULL) is copied from a "container" 
whose data type is the domain "d" to another container with that same data 
type, the constraints that might have been defined for "d" are not re-checked.

I'm using "container" here as an ad hoc, and somewhat loose, umbrella term of 
art for any of these:

—row-column intersection in a table (or view)
—an attribute of a composite type instance
—a local variable in a PL/pgSQL subprogram or anonymous block 
—a PL/pgSQL subprogram's formal parameter
—a PL/pgSQL subprogram's return datum
—and so on

I quite like this compact illustration of the paradox. (I expect that everybody 
has their own favorite.)

create domain text_nn as text not null;

create view null_having_null_not_constraint(t_nn, dt) as
with
  c1(t_nn) as (
values('dog'::text_nn)),

  c2(t_nn) as (
select (select t_nn from c1 where null))

select
  t_nn, pg_typeof(t_nn)
from c2;

\pset null ''
select t_nn, dt from null_having_null_not_constraint;

This is the result:

 t_nn |   dt
--+-
  | text_nn

And I quite like this demo of the fact that copying a datum between containers 
with the same constrained domain data type doesn't re-check the constraints:

do $body$
declare
  t_var_nn text_nn := '';
begin
  t_var_nn := (select t_nn from null_having_null_not_constraint);
  assert (t_var_nn is null), 'Expected "t_var_nn" (paradoxically) to be NULL 
here';
end;
$body$;

I'll use « the "paradoxical" pattern », below, to denote the fact that you can 
find a NULL in a container whose datatype has a "not null" constraint.

 Here's the summary of my tests:

|—
||  
   |
|  (1) select t_nn, dt from  |  Follows the "paradoxical" 
pattern  |
|  null_having_null_not_constraint;  |  
   |
||  
   |
—|——
||  
   |
|  (2) select t_nn... into   |  
   |
|  text_nn local variable|  Follows the "paradoxical" 
pattern  |
||  
   |
—|——
||  
   |
|  (3) select t_nn... into   |  
   |
|  text_nn field in composite type   |  Follows the "paradoxical" 
pattern  |
||  
   |
—|——
||  
   |
|  (4) assign (select t_nn...) to|  
   |
|  text_nn IN formal |  Follows the "paradoxical" 
pattern  |
||  
   |
||——
||  
   |
|  (5) returning t_nn in function that   |  
   |
|  returns "text_nn" |  Follows the "paradoxical" 
pattern  |
||

User's responsibility when using a chain of "immutable" functions?

2022-06-28 Thread Bryn Llewellyn

I’ve copied my self-contained testcase at the end.

I create three functions, marking each of them "immutable". "f1()" simply 
returns the manifest constant 'dog'. So it seems perfectly honest to mark it as 
I did. "f2()" simply returns "f1()"—so, here too, it seems that I'm being 
honest. But I do see that I'm using human reasoning, and that Postgres cannot 
check that I'm right. In the same way, and with the same reasoning for my 
marking, "f3()" returns "f2()".

Then I do this:

select rpad('at start', 30) as history, f1(), f2(), f3();

Then I drop, and then re-create "f(1)", now returning 'cat', and do this:

select rpad('after drop, re-create f1()', 30) as history, f1(), f2(), f3();

Finally, I create-and-replace "f3()", using the identical source text, and do 
this:

select rpad('after create-and-replace f3()', 30) as history, f1(), f2(), f3();

Here's what I see when I run my .sql script:

history | f1  | f2  | f3  
+-+-+-
 at start   | dog | dog | dog
 after drop, re-create f1() | cat | cat | dog
 after create-and-replace f3()  | cat | cat | cat

I understand that the possible session-duration caching that I allow with 
"immutable" doesn't guarantee that I'll get any caching. But I had expected a 
cascade purge on anything that was cashed on delete of "f1()".

Something seems odd to me: if I do my "select  f1(), f2(), f3()" after dropping 
"f1()" (i.e. before re-creating it) then I get an ordinary error saying that 
"f1()" doesn't exist. So it seems that Postgres does understand the dynamic 
dependency chain—even when the result from "f3()" is cached. If I then recreate 
"f1()" to return 'cat', I get no error—but, same as in my straight-through 
test, "f3()" continues to return its cached (and now "wrong") result.

Should I simply understand that when I have such a dynamic dependency chain of 
"immutable" functions, and should I drop and re-create the function at the 
start of the chain, then all bets are off until I drop and re-create every 
function along the rest of the chain?


-- testcase.sql

\t off

drop function if exists f3() cascade;
drop function if exists f2() cascade;
drop function if exists f1() cascade;

create function f1()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return 'dog';
end;
$body$;

create function f2()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return f1();
end;
$body$;

create function f3()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return f2();
end;
$body$;

select rpad('at start', 30) as history, f1(), f2(), f3();

\t on

drop function f1() cascade;
create function f1()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return 'cat';
end;
$body$;

select rpad('after drop, re-create f1()', 30) as history, f1(), f2(), f3();

create or replace function f3()
  returns text
  immutable
  language plpgsql
as $body$
declare
  t1 constant text := f2();
begin
  return t1;
end;
$body$;

select rpad('after create-and-replace f3()', 30) as history, f1(), f2(), f3();

\t off

Re: User's responsibility when using a chain of "immutable" functions?

2022-06-28 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> x...@thebuild.com wrote:
>> 
>>> b...@yugabyte.com wrote:
>>> 
>>> Should I simply understand that when I have such a dynamic dependency chain 
>>> of "immutable" functions, and should I drop and re-create the function at 
>>> the start of the chain, then all bets are off until I drop and re-create 
>>> every function along the rest of the chain?
>> 
>> 
>> Yes.
>> 
>> You don't have to drop and recreate the functions, though. DISCARD PLANS 
>> handles it as well:
> 
> Specifically:
> 
>> select f1(), f2(), f3();
>>  f1  | f2  | f3  
>> -+-+-
>>  cat | cat | cat
> 
> 
> The pl/pgsql plan cache now contains the following:
> 
> SELECT f1() => 'cat'
> SELECT f2() => 'cat'
> 
>> drop function f1();
> 
> Now the cache only contains:
> 
> SELECT f2() => 'cat'
> 
> The f1 plan has been invalidated due to the drop/replace action on the f1 
> function
> 
>> create function f1() returns text as $$ begin return 'dog'; end $$ language 
>> plpgsql immutable;
>> select f1(), f2(), f3();
>>  f1  | f2  | f3  
>> -+-+-
>>  dog | dog | cat
> 
> And so f3()'s invocation of "SELECT f2()" yields 'cat' from the cache since 
> that one hasn't been invalidated. While f2() replans its f1() invocation and 
> thus returns 'dog'
> 
> The fundamental limitation here is that there really is no attempt being made 
> to deal with inter-functional dependencies. Their bodies are blackboxes 
> (...wonders how this resolves in the new SQL Standard Function Bodies 
> implementation...) and no explicit dependency information is recorded either. 
> So we don't know that the saved plan for f2() depends on a specific version 
> of f1() and thus if f1() is changed plans involving f2() should be 
> invalidated along with plans involving f1(). Nor is there sufficient 
> recognized benefit to doing so.

DISCARD PLANS is unsafe in a multi-user concurrent scenario. The doc says 
explicitly that its scope is just the single session. And it's easy to show the 
danger by using my testcase manually, step by appropriate step, with two 
concurrent sessions.

However, you said (indirectly) that the session-duration caching is a red 
herring—and that the real danger comes with an expression-based index that 
involves a PL/pgSQL function. I agree.

PG's lack of dependency tracking shows up with just a "worker" function f1() 
and a "jacket" function f2() when you base the index on f2(). You can happily 
drop and recreate f1() with a new implementation while the index lives on. (For 
the reasons that we've mentioned, the "2BP01: cannot drop function... because 
other objects depend on it" error doesn't occur.)

I've concluded that the only practical practice for "immutable" is to reserve 
its use for functions that don't mention even a single user-created artifact.

Moreover, this "hermetic" property of a to-be-immutable function can be 
established only by human analysis of the function's source code.

Re: User's responsibility when using a chain of "immutable" functions?

2022-06-29 Thread Bryn Llewellyn

> x...@thebuild.com wrote:
> 
> I think "not allowed" is putting it too strongly. It would be a bit much to 
> ask that every single user-written immutable function be 100% perfect when it 
> is rolled out, and never have to fix any bugs in them. However, you 
> definitely *do* have to understand that there are administrative consequences 
> for doing so, like rebuilding indexes and invalidating session caches. I 
> think that the OP's statement that you can't ever use user-defined functions 
> from an immutable function is too strong, too; you need to be aware of the 
> consequences if you change an immutable function in a way that alters the 
> return result for a previously-valid set of arguments.

My email that started this discussion has prompted a lot of answers in a few 
branches. This means that it's too hard for me to respond carefully to 
everything that's been said. But it does seem that there are a few different 
opinions about how safety might be ensured in the face of wrong results risks 
and what, therefore, might define proper practice.

I realize, now, that I didn't distinguish  between:

(1) What you might do (with some caution and attention to detail) in the 
development shop; and

(2) What you might do when patching the database backend of a deployed 
production system.

Case #2 is arguably clear cut—as long as you accept that there's no such thing 
as safe hot patching (unless it's Oracle Database and you have edition-based 
redefinition). So, accepting this, you have to quiesce the system and make all 
your changes in a self-imposed single-session fashion. Of course, you have to 
pay attention to expression-based indexes. But anyone who adopts my maximally 
cautious approach of allowing only hermetic "immutable" functions and who never 
uses "create or replace" will find that the index safety risk looks after 
itself.

And case #1 is arguably less of a concern—unless it compromises your regression 
testing.

Anyway...

PG has a very lightweight scheme for dependencies that tracks just a few 
cases—like the dependence of an expression-based index that references a 
user-defined function upon that function. But, significantly, 
function-upon-function dependence (like my testcase showed) is not tracked. 
This is a non-negotiable fundamental property of PG.

It's worth noting that (at least as I have understood it) the "immutable" 
notion is simply a hint that gives PG permission to cache results rather than 
to calculate them afresh on every access. And that this is a device that seeks 
to improve performance. But significantly, there's no robust cache invalidation 
scheme (and nor could there be) so this leaves it to the user to promise safety.

There's no possible discussion about the trade-off between performance and 
correctness. So this argues for, at least, huge caution when you think that you 
might mark a function "immutable". Having said this, I noted this from 
pavel.steh...@gmail.com:

> I know so many hard performance issues are related to missing STABLE or 
> IMMUTABLE flags of some functions.

A caveat is needed because you're not allowed to reference a user-defined 
function in an expression-based index unless it's marked "immutable". But this 
ultimately is no more than a formal prick to the user's conscience. Nothing 
stops you from lying through your teeth in this scenario.

It was all these considerations that led me to my proposal for *my own* 
practice:

(1) Never use "create or replace" to change an "immutable" function—but, rather 
always use "drop" and a fresh bare "create".

(2) Never mark a function "immutable" unless its behavior is determined 
entirely by its own source text. (I'll say this as "unless the function is 
hermetic".) This notion accommodates use of some built-in functions (but even 
there, caution is needed because of how session settings can affect the 
behavior of some built-ins) but it most certainly prohibits any reference to 
user-defined artifacts (including to other "immutable" functions.) It also 
prohibits catalog queries.

Both david.g.johns...@gmail.com and x...@thebuild.com have argued that my 
stance is overly cautious. Fair enough. They can make their own minds up. But 
what convinces me is the complete lack of static dependencies and the 
infeasibility of discovering all dynamic dependencies by exhaustive human 
analysis of source text when the only help you have is a catalog query to 
identify all "immutable" functions in a database and another to identify all 
indexes that reference a function. (But here, as far as I can tell, you need 
human inspection to spot the subset that are user-defined functions.)

One more thing...

I see now that I didn't clearly describe an outcome shown by the testcase that 
I copied in my email that started this thread. Here's a sufficient, shortened, 
version. The setup is what I showed you before.

When I drop the first link, f1() in the chain of "immutable" functions, I see 
that I cannot invoke f

Re: User's responsibility when using a chain of "immutable" functions?

2022-06-29 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> When I drop the first link, f1() in the chain of "immutable" functions, I 
>> see that I cannot invoke f(2) because it now complains that f1() doesn't 
>> exist. This surprises me because, after all, the result of f2() is now 
>> cached (at least as I suppose) and its body isn't executed to produce the 
>> result. This outcome almost suggests that there is, after all, a dependency 
>> tracking scheme at work.
>> 
>> Yet I can still invoke the third link, f(3), and it still does produce the 
>> value that it had cached!
> 
> The “cache” is just a prepared statement plan. You didn’t create any of those 
> yourself at the top SQL context so you don’t see caching effects in the stuff 
> you execute in SQL directly.

Thanks, Got it.

> PL/pgSQL, however, creates prepared statement plans for any internal SQL it 
> executes (i.e., it compiles the function). That is the caching artefact you 
> are seeing and why I mentioned pl/pgsql in my reply where I described why you 
> saw the results you did.

Thanks again. Got this, too, now.

> IOW, when you write : "select f2();" in SQL, f2() is always called, it is 
> never directly replaced with a cached value. The documentation does say this 
> though I lack the relevant paragraph reference at the moment.

Ah... but where is it!

Thanks, David. It's clear that my mental model has been missing at least one 
critical piece. (My instincts have been conditioned by too many years with 
Oracle Database to realize when I'm making some wrong assumption in a PG 
context.)

Anyway... armed with this new knowledge, I can write a much simpler demo of 
this risk brought when an "ummutable" function calls another with just two 
function.

I copied this below. But then I made a slight change. And this brought an error 
that I can't explain. Any ideas?

Meanwhile. I'll appeal for some pointers to what I should read:

I *had* understood that the SQL that a user-created subprogram issues (at least 
for "language sql" and "language plpgsql") is implicitly prepared. But I've no 
idea what it uses as the "handle" for such a prepared statement. Might it be, 
for example, a node in the AST that represents the subprogram or anonymous 
block in my session? In the same way, I've no idea what the outcome is when two 
different subprograms issue the identical (or identical post-canonicalization) 
SQL statement text. I don't know how to search the PG doc to find the 
explanations that I need. For example "pl/pgsql execution model" gets just a 
single hit in in a piece about locking. Wider Internet search gets too much 
noise, and too much unreliable suff from self-appointed experts, to be of much 
help. I was excited to find "Plpgsql_internals.pdf" by pavel.steh...@gmail.com. 
But disappointed to find that it didn't answer my questions (and nor could it 
when it explains things w.r.t the C implementation).

Does anybody have any recommendations for what I might study?

Of course, I started with the account of "immutable" in the "create function" 
doc:

«
IMMUTABLE indicates that the function cannot modify the database and always 
returns the same result when given the same argument values; that is, it does 
not do database lookups or otherwise use information not directly present in 
its argument list. If this option is given, any call of the function with 
all-constant arguments can be immediately replaced with the function value.
»

"immediately replaced with the function value" implies a cache. But not when 
it's populated (and when it isn't) or what its lifetime might be. It certainly 
doesn't mention cache invalidation.

Of course, I read the account in the "prepare" doc too. That says nothing to 
the effect that the result of prepare, when the SQL statement references an 
"immutable" function, is to build a cache whose key is the vector of input 
actuals and whose payload is the corresponding return value. Nor does it 
mention the cache's capacity and what happens when (if) the cache becomes full.

About your comment thus:

> "select f2();" in (explicit) SQL, f2() is always called, it is never directly 
> replaced with a cached value 

I suppose that this can be deduced from the fact that the cache mechanism is 
the prepared statement (not that this latter point is spelled out).

-- TEST ONE. FITS MY MENTAL MODEL. NICELY SHOWS THE "WRONG RESULTS" RISK.

create function f1()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return 'dog';
end;
$body$;

create function f2()
  returns text
  immutable
  language plpgsql
as $body$
begin
  return f1();
end;
$body$;

prepare q1 as select f1(), f2();
execute q1; << Gets "dog | dog"

prepare q2 as select f2();
execute q2; << Gets "dog"

/*
  Presumably dropping f1() invalidates q1 but
  leaves its definition intact so tha

Re: User's responsibility when using a chain of "immutable" functions?

2022-06-30 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> b...@yugabyte.com wrote:
>> 
>> Meanwhile. I'll appeal for some pointers to what I should read...
> 
> I tend not to search...or at least that isn't my first (or at least only) 
> recourse. The pg/pgsql chapter has a subchapter named "Plan Caching":
> 
> https://www.postgresql.org/docs/current/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING
> 
> You really need to read the "see related" reference there to get the level of 
> detail that you want:
> 
> https://www.postgresql.org/docs/current/xfunc-volatility.html
> 
> "This category allows the optimizer to pre-evaluate the function when a query 
> calls it with constant arguments."
> 
> The implication is that this operation is not session-scoped but 
> query-scoped. Other parts of the page reinforce this.  Not saying it is 
> perfect wording but I came by my understanding pretty much exclusively from 
> this documentation.

Thank you very much for the doc pointers, David. I believe that I have all I 
need, now. I understood already that "giving permission to cache" doesn't mean 
that PG will actually cache anything. I wanted only to find a compelling 
example of how lying when you mark a function "immutable" can bring wring 
results. I think that this is sufficient:

set x.a = '13';

create function dishonestly_marked_immutable(i in int)
  returns int
  immutable
  language plpgsql
as $body$
begin
  return i*(current_setting('x.a')::int);
end;
$body$;

prepare q as
select
  dishonestly_marked_immutable(2) as "With actual '2'",
  dishonestly_marked_immutable(3) as "With actual '3'";

execute q;

set x.a = '19';
execute q; --<< Produces the stale "26 | 39".

discard plans;
execute q; --<< Now produces the correct "38 | 57"

Seems to be impossible to set a NULL search_path

2022-07-05 Thread Bryn Llewellyn

The section "Writing SECURITY DEFINER Functions Safely":

https://www.postgresql.org/docs/current/sql-createfunction.html#id-1.9.3.67.10.2

explains the risk brought if a bad actor creates an object that preemps what 
the developer intended by putting it in a schema that's ahead of the intended 
object in the search_path.

You can avoid this risk by always using fully qualified object names. It seems 
strange that the section doesn't mention this obvious approach. Is it 
vulnerable to subversion in a way that I haven't spotted?

I suppose that there are use cases where the actual plan is to resolve to the 
first object that has the right name as the search_path is traversed. (But this 
seems not to be the common case.) This is where setting the search_path as an 
attribute of a subprogram helps.

I wondered about a self-documenting belt-and-braces approach: use fully 
qualified object names in the subprograms source code and declare that I want 
no risk of mis-use of the search_path by setting it to null. But this seems not 
to be possible. Am I right?

I've confirmed that even a superuser cannot create objects in a "special" 
schema like "pg_catalog" or "pg_toast". So this gives me a workaround to the 
limitation that I cannot force the use of fully qualified names by setting a 
null search_path: I could set the attribute of my subprogram to "pg_catalog".

Apart from the fact that, as I suppose, this would be a rare and therefore 
possibly puzzling pattern (so clear doc about the purpose would be needed), are 
there any risks that I haven't spotted?

Finally, what do you think of a possible future enhancement to allow setting a 
null search_path?

Re: Seems to be impossible to set a NULL search_path

2022-07-05 Thread Bryn Llewellyn

>>> adrian.kla...@aklaver.com wrote:
>>> 
>>> set search_path = '';
>>> show search_path ;
>>> search_path
>>>-
>>>   ""
>> pavel.steh...@gmail.com
>> 
>> ...But still in this case, there is pg_catalog in search path.
> 
> Yes but from OP:
> 
> « I've confirmed that even a superuser cannot create objects in a "special" 
> schema like "pg_catalog" or "pg_toast". So this gives me a workaround to the 
> limitation that I cannot force the use of fully qualified names by setting a 
> null search_path: I could set the attribute of my subprogram to "pg_catalog". 
> »
> 
> So Bryn Llewellyn does not seem to be concerned about that.

Thanks, all, for your replies. I'd assumed that the arguments of "set 
search_path" had to be SQL names. so I tried "". But that caused an error. I 
didn't try the ordinary empty string because I'd assumed that, as an illegal 
SQL name, it would be illegal in "set search_path". Hmm...

I'm slightly troubled by "works right now":

> x...@thebuild.com wrote:
> 
> You use the empty string, rather than NULL... it works right now: 

But because you experts all recommend it, I'll go with it. It's more expressive 
of what I mean than is using "pg_catalog".

(I'm assuming that having such an unwritable schema inevitably on the 
search_path is simply an entirely benign benefit. But it could give a strange 
message to the reader in my use case.)

Re: Seems to be impossible to set a NULL search_path

2022-07-05 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com writes:
>> 
>> ...I'd assumed that the arguments of "set search_path" had to be SQL names...
> 
> search_path's value is not a SQL name. It's a list of SQL names wrapped in a 
> string ... and the list can be empty.

I was informed by this precedent:

truncate table u1.t1, t2;

It uses a comma-separated list of optionally qualified  SQL names.

And this:

«
CREATE SCHEMA schema_name ...
»

It requires a single unqualified SQL name.

And then this:

«
SET [ SESSION | LOCAL ] configuration_parameter { TO | = } { value | 'value' | 
DEFAULT }

value — New value of parameter. Values can be specified as... identifiers... or 
comma-separated lists of these, as appropriate for the particular parameter...
»

Notice that the token « value » in the syntax is not in quotes. I took it to 
mean (for "set search_path" a SQL name that you would type bare when it's 
simple and surrounded by double quotes when it's exotic—in line with the much 
broader general rule.

And I did ad hoc tests like these.

create schema "s1, s2";
create table "s1, s2".t(k int);
insert into "s1, s2".t(k) values(42);
set search_path = "s1, s2";
select k from t;

After all, "s1, s2" is a perfectly legal SQL name—even though folks usually 
have rules of practice to avoid exotic names like these.

I saw that the test behaves the same if I use this:

set search_path = 's1, s2';

I put that down to an unnecessary and confusing forgiveness that got 
grandfathered it.

So I'm very confused by your comment. What am I missing?.

> A bit off topic: I'm not sure how you came to the conclusion that superusers 
> can't write into pg_catalog.  They can.

With tests like these:

\c demo postgres
create table pg_catalog.t(n int);

It fails with this:

42501: permission denied to create "pg_catalog.t"

I did note this detail: "System catalog modifications are currently 
disallowed." Is there a configuration parameter that controls this?

> I don't see much point in being paranoid... if an adversary has already 
> obtained superuser privileges

Yes, that point is very well taken. But I like to know the limit's of what's 
technically possible.

Re: Seems to be impossible to set a NULL search_path

2022-07-05 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> t...@sss.pgh.pa.us wrote:
>> 
>> search_path's value is not a SQL name.  It's a list of SQL names wrapped in 
>> a string ... and the list can be empty.
> 
> This doesn't seem to be correct - wrapping them in single quotes in the SET 
> command ends up behaving as if you wrapped them in double quotes anywhere 
> else (and wrapping them individually in double quotes here works just fine 
> too).

And then...

> adrian.kla...@aklaver.com wrote:
> 
> Those are creating objects. Set search_path is setting a configuration value. 
> Pretty sure it is:
> 
> { TO | = } { value | 'value' | DEFAULT 

There's different use cases. For example:

set my_namspace.x = 'Dog house';
show my_namspace.x ;

I can't reconcile what you three (Tom, David, and Adrian) have said. I'm 
interested to hear how you interpret what I showed in this reply:

https://www.postgresql.org/message-id/48E1391E-5A21-4736-B4B1-8B9468ECAFD4%40yugabyte.com

and in particular to this:

create schema "s1, s2";
create table "s1, s2".t(k int);
insert into "s1, s2".t(k) values(42);
set search_path = "s1, s2";
show search_path;
select k from t;

OR (with single quotes in "set search_path":

create schema "s1, s2";
create table "s1, s2".t(k int);
insert into "s1, s2".t(k) values(42);
set search_path = 's1, s2';
show search_path;
select k from t;

I get a resounding 42 in both cases. Now try this:

set search_path = no_such_schema, "No Such Schema";
show search_path;

All outcomes accord with the mental model that you tell me is wrong.

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> adrian.kla...@aklaver.com wrote:
> 
> Not sure what your point is?

Try these two:

set timezone = 'America/New_York';
show timezone;

set timezone = "America/New_York";
show timezone;

Neither causes an error. The "show", in each case, prints the bare value with 
no quotes. It never struck me try try double quotes around the timezone 
argument. I'm shocked that they are silently accepted here and seem to have the 
same effect (in this syntax setting) as single quotes. (I realize that quoting 
is required.) This comes as shock. It seems to be a "solution" to problem that 
I don't have—and it brings confusion. Try this:

set search_path = '"x"'; -- Hard to read. It's double-quoted x surrounded by 
single quotes.
show search_path;

This is the result:

 search_path 
-
 """x"""

That's a run of three double quotes each side of x. (For sport, try a 
single-quoted x surrounded by double-quotes.) I have no idea what this means—or 
why it's allowed. But the fact that the quoting has different effects in my 
different examples led me to say what I did.

When I read "string" in doc, then I understand that a manifest string constant 
(like the timezone setting that I used) must be single-quoted in SQL statements 
and the like. Not double-quoted.

All this seems to be 100% distinct from the discussion of exotic SQL names that 
have to be double-quoted in SQL syntax where a simple name doesn't need this 
(be can be double quoted if you want).

These two produces different outcomes:

select 'cat';
select "cat";

I won't risk explaining why I'm, not at all surprised.

Enough! I'll assume that we're talking past each other. But the cost/benefit of 
trying to work out why comes out wrong—so I'll stop. Thanks for trying to help 
me.

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> unless you are going to write: operator(pg_catalog.=) in your function the 
> advice to always use schema qualifications is not going to be taken 
> seriously... the correct search_path to set isn't "empty" but "pg_catalog", 
> "pg_temp".  While this does violate "DRY" principles it is the solution you 
> are looking for.

The advice always to use fully qualified names is orthogonal to the advice to 
set an explicit search_path in a subprogram's definition. Am I right that your 
point is about what search path to set and has nothing to do with the possible 
use of qualified names?

I do see that I can achieve my goal by setting the search_path to pg_catalog, 
pg_temp in my functions rather than to empty. So I can adopt that practice. But 
it's at odds with an example in this section:

Writing SECURITY DEFINER Functions Safely
https://www.postgresql.org/docs/current/sql-createfunction.html#id-1.9.3.67.10.2

The example sets a search path the does *not* mention pg_catalog.

I hadn't come across this locution:

operator(pg_catalog.=)

But I found an account here:

4.2.5. Operator Invocations
https://www.postgresql.org/docs/current/sql-expressions.html#SQL-EXPRESSIONS-OPERATOR-CALLS

To be sure, I tried this:

select 17 operator(pg_catalog.=) 17;

It ran without error and produced TRUE.

I believe that you're saying that because (as I presume) the "normal" 
definition of the = operator is done by code in the pg_catalog schema, 
pg_catalog must be on the path for ordinary equality tests to work. (I also 
found random folks saying this on StackExchange.) So I tried this:

create function f()
  returns boolean
  set search_path = ''
  security definer
  language plpgsql
as $body$
begin
  return (2 + 5) = 7;
end;
$body$;
select f()::text;

It succeeded and said true. I don't know how to explain this outcome. For good 
measure, I tried this too:

create procedure p()
  set search_path = ''
  security definer
  language plpgsql
as $body$
begin
  create temporary table x(k int);
end;
$body$;
call p();

It succeeded. And the \d metacommand showed me that I now have a table 
pg_temp_3.x. Using a different database, it ends up in "pg_temp_1. What's going 
on here? Is "pg_temp" a kind of generic nickname for ANY "pg_temp_N"?

I hate to realize that I'm failing to understand a fundamental principle.

It rather looks like the name-res to pg_catalog and (some) pg_temp is 
hard-coded and doesn't rely on the reigning search_path. Or, to put it another 
way, these two schemas are inevitably at the end of the search_path no matter 
what you set explicitly, and never mind that "show search_path" doesn't show 
them unless you also put them on the path (again) explicitly.

I can't make sense of this wording from "Writing SECURITY DEFINER Functions 
Safely":

« A secure arrangement can be obtained by forcing the temporary schema to be 
searched last. To do this, write pg_temp as the last entry in search_path. »

If I do this:

set search_path = 'pg_catalog, pg_temp';
show search_path;

Then I see what I set—in that order. But if I set the search_path to empty (and 
don't see pg_catalog or pg_temp with "show") PG behaves as if they're still 
there. Not only as my f() and p() above show. But even, say, "select count(*) 
from pg_class". Moreover, this is allowed too:

set search_path = 'pg_temp, pg_catalog, pg_temp';
show search_path;

Now I see exactly what I set. It seems strange that this is allowed. How does 
the implementation handle this when a to-be-resolved name exists nowhere? Does 
it just crank on, repeatedly searching where it already failed, right up to the 
bitter end?

Here's another test whose outcome surprises me and seems to be at odds with 
what you're saying and what the "Writing SECURITY DEFINER Functions Safely" 
section says:

select count(*) from pg_class; -- 399
create temporary table pg_class(k int);
select count(*) from pg_class; -- 0
set search_path = 'pg_catalog, pg_temp';
select count(*) from pg_class; -- STILL 0

Why does the final "select" show that the temp table's name has still captured 
the one in pg_catalog even though it's ahead in the path.

Might I trouble you to explain more carefully—or to give me a doc reference 
that will allow me to see why the equality function in my example f() is found 
when the path for the function is empty? And why my example procedure p() 
manages to create a temporary table while pg_temp is not on the search_path?

> The main problem is that by doing search_path manipulation on the function 
> you prevent inlining it into the main query.

The "Writing SECURITY DEFINER Functions Safely" section explicitly recommends 
that a subprogram includes a "set search_path" specification. But, as I read 
it, you're saying that this advice is wrong (at least when a function will be 
invoked in more than a bare "select" because it prevents inlining.

How should I resolve these two conflicting pieces of advice?

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> At the level of discussion you want to have when you encounter unfamiliar 
> syntax please read the syntax chapter for the related concept (expression 
> identifiers).
> 
> https://www.postgresql.org/docs/current/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS

Thanks. That section makes a valiant attempt to distinguish between 
"identifier" (as a token in a SQL statement or, say PL/pgSQL source code) and 
"name" as what the identifier denotes. But (I think) it slips up here:

«
A convention often used is to write key words in upper case and names in lower 
case, e.g.:

UPDATE my_table SET a = 5;
»

It should be « to write key words in upper case and unquoted identifiers in 
lower case », yes?

About my

 search_path 
-
 """x"""

yes, I was confused—by, for example, this precedent.

create table "He said ""dog"", I think."(k int);
\d

I see this:

List of relations
 Schema |  Name   | Type  | Owner 
+-+---+---
 u1 | He said "dog", I think. | table | u1

So here the convention is to list the name of the table and not the identifier 
that denoted the intended name at creation time.

The situation is analogous with setting the search_path. I want to talk about 
schemas. Schemas have names. So in SQL syntax, I must denote these names by 
writing identifiers. It could have been decided that the proper way to display 
a search_path is by listing the schema names (just as \d does for tables). But 
it was decided, instead, to denote the path by the list of identifiers that 
denote the schema names. This doesn't present a huge usability challenge. But 
it is, nevertheless, a rule that you have to learn (which I had) and then 
remember (which I didn't).

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote
>> ...
> 
> You either didn't read or failed or retain knowledge of the words in the 
> documentation that are the canonical reference for search_path and explain 
> exactly this. I suggest you (re-)read them.
> 
> https://www.postgresql.org/docs/current/runtime-config-client.html#RUNTIME-CONFIG-CLIENT-STATEMENT
>   (under search_path)
> 
> And elsewhere I'm sure it is written that since temporary objects are 
> session-local it was decided that a useful implementation detail for that 
> would be for each session to have its own temporary schema, hence the 
> appended integer to distinguish them (referencing pg_temp works, the system 
> resolves the session specific schema name for you).
 
Right. Mea maxima culpa. « the current session's temporary-table schema... can 
be explicitly listed in the path by using the alias pg_temp. »

It seems that the wording is wrong here:

« The value for search_path must be a comma-separated list of schema names. »

It's schema identifiers—and not schema names. Yes?

>> Here's another test whose outcome surprises me...
> 
> Remember that session scoped relation cache we went on about a little while 
> back...I think that by creating the object you got a cache invalidation but 
> simply changing the search_path does not cause a cache invalidation.

The problem was my stupid typo: writing « set search_path = 'pg_catalog, 
pg_temp'; » when I should *not* have typed those single quotes. Now the demo 
that I'd intended gets the outcome that I'd expected:

select count(*) from pg_class; --<< 399
create temporary table pg_class(k int);
select count(*) from pg_class; --<<   0
set search_path = pg_catalog, pg_temp;
select count(*) from pg_class; --<< 400

>> The "Writing SECURITY DEFINER Functions Safely" section explicitly 
>> recommends that a subprogram includes a "set search_path" specification. 
>> But, as I read it, you're saying that this advice is wrong (at least when a 
>> function will be invoked in more than a bare "select" because it prevents 
>> inlining.
>> 
>> How should I resolve these two conflicting pieces of advice?
> 
> There is no "conflict" - you basically get to choose safety or performance. 
> Though since performance isn't guaranteed nor always a need I would say 
> choose safety unless you've confirmed that you need performance.

Thanks, that's crystal clear now.

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> «
>> A convention often used is to write key words in upper case and names in 
>> lower case, e.g.:
>> 
>> UPDATE my_table SET a = 5;
>> »
>> 
>> It should be « to write key words in upper case and unquoted identifiers in 
>> lower case », yes?
> 
> I would say: ... , and identifiers without quotes and in lower case

Yes, that works.

> An "identifier" is supposed to exist in the system and when you use it that 
> existence leads you to the object that is named.  search_path accepts labels 
> that aren't true identifiers because they don't have to exist in the system.

I think that you and I differ on this point. As I see things. the "identifier" 
notion belongs entirely to the domain of languages (like SQL and PL/pgSQL) and 
as such they don't "exist" anywhere except there (as notions) and then in 
specific language utterances (as instances of the notion). For example, just 
because « select * from "no such table" » causes the 42P01 error (when the 
identifier denotes a name that meets the expectation of its plain English 
reading), this doesn't mean that "no such table", qua identifier, didn't exist. 
After all, it exists in the SQL text where you see it. But I'll readily agree 
that, in most contexts of documentation and the like, the intended meaning is 
clear and it can be too hard (and even put off readers) always to be a stickler 
for the precise and correct use of terms of art.

> I've enjoyed this conversation precisely because it forces me to dig deeper, 
> think more critically, and understand the reasons behind the system's design 
> better. Yet in terms of being able to properly use search_path to achieve a 
> goal the couple of paragraphs in the documentation are sufficient for all 
> practical purposes I can see. And do not, with any frequency, seem to 
> generate questions from our users.

It's a huge relief to hear this, David. I was afraid that I might be annoying 
you. Yes, I'll agree with your "for all practical purposes" stance—but with a 
caveat. There's a famous SQL injection example, beloved of bloggers, that rests 
on the fact that a developer didn't handle the possibility that, say, a table 
might have an exotic name. Then, maybe, they get the name from some metadata 
and build the text of a SQL statement that uses it. Because they don't know the 
risks, and don't have the concepts and the associated well-defined terms of art 
to discuss what they need to consider with themselves, they neglect to use 
available techniques to avoid risks.

I've all too often managed to survive with a partial understanding (in any 
corner, using any software setting) and then managed to slip up when I stray 
from the typical scenarios. Just like the apocryphal developer who doesn't 
double-quote supposed identifies that come from some external source. That's 
why I strive to get the deepest possible understanding of things (and this 
includes "why was it done this way?"). So thank you vey much for helping me 
with the present particular corner of postgreSQL that's been my latest 
obsession. I'm ready to lay it aside now, and move on...

Re: Seems to be impossible to set a NULL search_path

2022-07-06 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> As for "schema identifiers" vs. "schema names" - they both seem equally 
> wrong. The list can very much contain sequences of characters that when 
> interpreted as an identifier and looked for in the pg_namespace catalog do 
> not find a matching entry and are therefore by definition not the name of any 
> known schema in that database.
> 
> Besides, I hazard to guess how many times we write "table name" and "column 
> name" in the documentation when your argument is that "table identifier" and 
> "column identifier" is the correct choice.  No, rather "name" and 
> "identifier" in the context of database objects are known to mean the same 
> thing - the alphabetic name of the object.

Well, "putative" or "candidate" can be used to resolve your existence 
criterion. But why bother? In my book, Bertie Wooster (or Bertram Wilberforce 
Wooster if you prefer) is a perfectly fine candidate name in the general 
English speaking culture. It's neither here nor there if there happens to be 
any living person who has the name...

But never mind. If you'd like a diverting read on this topic, go here:

https://blogs.oracle.com/sql/post/a-collection-of-plsql-essays

look for this, and download the PDF:

«
Names vs identifiers

Databases are full of things: tables, sequences, columns, views, PL/SQL units, 
what have you. Things have names and are manipulated by mentioning the names. 
The programming languages SQL and PL/SQL use identifiers, not names. Questions 
show many programmers are confused about the difference. This note describes 
the relationships between things, names, and identifiers. Once the programming 
rules are absorbed, developers can write code faster and with less heartburn.
»

It's written by a former colleague with whom I spent many happy hours 
discussing the topic.

Over and out?

Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread Bryn Llewellyn

> b...@yugabyte.com wrote:
> 
>> david.g.johns...@gmail.com  wrote:
>> 
>> As for "schema identifiers" vs. "schema names" - they both seem equally 
>> wrong. The list can very much contain sequences of characters that when 
>> interpreted as an identifier and looked for in the pg_namespace catalog do 
>> not find a matching entry and are therefore by definition not the name of 
>> any known schema in that database.
>> 
>> Besides, I hazard to guess how many times we write "table name" and "column 
>> name" in the documentation when your argument is that "table identifier" and 
>> "column identifier" is the correct choice.  No, rather "name" and 
>> "identifier" in the context of database objects are known to mean the same 
>> thing - the alphabetic name of the object.
> 
> Well, "putative" or "candidate" can be used to resolve your existence 
> criterion. But why bother? In my book, Bertie Wooster (or Bertram Wilberforce 
> Wooster if you prefer) is a perfectly fine candidate name in the general 
> English speaking culture. It's neither here nor there if there happens to be 
> any living person who has the name...
> 
> But never mind. If you'd like a diverting read on this topic, go here:
> 
> https://blogs.oracle.com/sql/post/a-collection-of-plsql-essays 
> 
> 
> look for this, and download the PDF:
> 
> «
> Names vs identifiers
> 
> Databases are full of things: tables, sequences, columns, views, PL/SQL 
> units, what have you. Things have names and are manipulated by mentioning the 
> names. The programming languages SQL and PL/SQL use identifiers, not names. 
> Questions show many programmers are confused about the difference. This note 
> describes the relationships between things, names, and identifiers. Once the 
> programming rules are absorbed, developers can write code faster and with 
> less heartburn.
> »
> 
> It's written by a former colleague with whom I spent many happy hours 
> discussing the topic.
> 
> Over and out?

Something has been nagging me since I sent this. I was sure that I’d seen this:

"Bllewell"

in some catalog view where I expected the name of a role. I just found it in 
pg_proc. The spelling Bllewell with no quotes is the name of an o/s user on my 
MacBook. I don't know why it ends up in my MacBook's PG installation—but that's 
a story for a different day. My PG cluster has a database called demo. I just 
did this test:

\c demo postgres

-- What a pain to have to do this to get silent "if exists" behavior.
do $body$
begin
  begin
drop owned by "Exotic Me" cascade;
  exception when undefined_object then
null;
  end;
  begin
drop owned by """Exotic Me""" cascade;
  exception when undefined_object then
null;
  end;
end;
$body$;

create role "Exotic Me" login password 'p';
grant connect on database demo to "Exotic Me";

create role """Exotic Me""" login password 'p';

\c demo "Exotic Me"
create schema s;

create procedure s.p()
  language plpgsql
as $body$
begin
  null;
end;
$body$;

with c as (
  select
rolname   as role_name
  from pg_roles
  union all
  select
distinct proowner::regrole::text  as role_name
  from pg_proc)
select role_name
from c
where
  lower(role_name) like '%bllewell%' or
  lower(role_name) like '%exotic%';

This is the result:

  role_name  
-
 Bllewell
 Exotic Me
 "Exotic Me"
 "Bllewell"

Of course I understand why I see both Exotic Me with no quotes and "Exotic Me" 
with double quotes: I asked for it. But why do I see both Bllewell with no 
quotes and "Bllewell" with double quotes. Are there really two distinct roles 
with those two names? Or did pg_roles end up with the identifier for the exotic 
name Bllewell rather than the bare name itself?

And on the disputed notion that the identifier for a name is a distinct 
phenomenon from the name itself, I noted this here:
https://www.postgresql.org/docs/current/datatype-oid.html

«
The input functions for these types allow whitespace between tokens, and will 
fold upper-case letters to lower case, except within double quotes; this is 
done to make the syntax rules similar to the way object names are written in 
SQL. Conversely, the output functions will use double quotes if needed to make 
the output be a valid SQL identifier.
»

It comes close (but no cigar) to making the distinction. It uses both "object 
name" and "SQL identifier" and says (more or less) that a "SQL identifier" is 
the way an object name is written in SQL by double-quoting it. I'm using 
"exotic" as a tautological shorthand for what you *must* surround with 
double-quotes in SQL and PL/pgSQL to express what you want.

Re: Seems to be impossible to set a NULL search_path

2022-07-13 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>>   role_name  
>> -
>>  Bllewell
>>  ...
>>  "Bllewell"
>> 
>> ...Are there really two distinct roles with those two names?...
> 
> Is this another one of your mistakes in presenting a self-contained test case?

I was far, far too elliptical in what I wrote. Sorry, David. Yes, that was a 
mistake. These are the two fundamental observations that got me started on this:

Observation #1 (the native content of "pg_roles"):

select rolname as role_name from pg_roles order by 1;

 role_name 
---
 Bllewell
 pg_database_owner
 ...
 postgres

Observation #2 (the native content of "pg_proc"):

select
  proowner::regrole as role_name,
  proname
from pg_proc
order by 1, 2
limit 3;

 role_name  |proname 
+
 "Bllewell" | RI_FKey_cascade_del
 "Bllewell" | RI_FKey_cascade_upd
 "Bllewell" | RI_FKey_check_ins

The clue to the different spellings is the ::regrole typecast. I knew all along 
that I could get the name of the owner of a function from: 

pg_proc p inner join pg_roles r on p.proowner = r.oid

But I wondered if I could find an operator (using the word loosely) to save me 
some typing. Google took me to the "Postgres get function owner" post on Stack 
Exchange:
https://dba.stackexchange.com/questions/259001/postgres-get-function-owner

Laurenz Albe answered the question thus:

«
The complicated way is to join with pg_roles and get the user name from there. 
But since PostgreSQL hackers don't want to type more than necessary, they came 
up with something else... Internally, every object is... identified.. by its 
“object ID”. This is the number that is for example used in the proowner column 
of pg_proc... but the type output function, which is used for display of the 
type, renders it as the object's name. Put that together with PostgreSQL's type 
cast operator ::, and you can do... ::regrole.
»

This was exactly what I wanted! I took Lawrence's « ::regrole renders proowner 
as the [owner]'s name » at face value. But when the name happens to be exotic, 
as is Joe, you see that ::regrole in fact renders proowner as the *identifier* 
for the function's owner's name. The testcase at the end makes the point more 
explicitly. It does some set-up that lets me do this:

select
  r.rolnameas "The owner name",
  p.proowner::regrole  as "The *identifier* for the owner name"
from
  pg_roles r
  inner join
  pg_proc p
  on r.oid = p.proowner
where p.proname = 'something_strange';

 The owner name | The *identifier* for the owner name 
+-
 Joe| "Joe"

This is what I've been banging on about all the time. It seems that I'm the 
only person in the pgsql-general community who wants some proper terms of art 
(as I used them in my column aliases) that let me say simply and clearly, why 
pg_proc.proowner::regrole gives a differently spelled answer than does 
pg_roles.role_name.

B.t.w., I looked at "8.19. Object Identifier Types". It uses the term 
"identifier" in the more ordinary sense of "unique identifier" (as are the 
values in a table's PK column that might be numeric, text, uuid, or whatever. 
It mentions quote_ident(). I tried it thus:

select quote_ident('Joe') as joe, quote_ident('"Dog"') as dog;

  joe  |dog
---+---
 "Joe" | """Dog"""

(so that's right). And it goes on to say "...names that require quoting". So 
the difference is in the air and is hinted at with "ident" and "name". But the 
wording doesn't tie things down.

I'll shut up on this now.

--
-- Testcase setup

\c postgres postgres
create role "Joe" login password 'p';

create database tmp;
grant connect on database tmp to "Joe";
grant create on database tmp to "Joe";

\c tmp "Joe"
create schema s;
create procedure s.something_strange()
  language plpgsql
as $body$
begin
  null;
end;

PL/pgSQL: « arr[j].a := v » works fine in PG Version 14.4, fails to compile in Version 11.2. Which version brought the fix?

2022-07-25 Thread Bryn Llewellyn

I copied my testcase at the end. It runs OK and produces the output that I 
expect using PG Version 14.4. But using Version 11.9 (and earlier 11 
sub-versions), it fails to compile with this error:

syntax error at or near "."
lhs[j].a := rhs[j].a;
  ^
If I comment out the "Ideal approach" loop, then it runs fine in PG 11 and 
produces the same output as the "Ideal approach" does in PG 14,

I looked at the Release Notes for each of Versions 14, 13, and 12:

https://www.postgresql.org/docs/12/release-12-1.html 

https://www.postgresql.org/docs/13/release-13-1.html 

https://www.postgresql.org/docs/14/release-14-1.html 


I searched in the page for "array". But there were no relevant hits on any of 
those three pages.

Can anybody tell me which PG release brought the improvement?

p.s., I'm asking because YugabyteDB still uses the PG 11.2 SQL processing code. 
I'm promised that fairly soon a new YugabyteDB version will use the PG 13 SQL 
processing code. I'm hoping that, then, I'll be able to retire my workaround.



create type t as (a text, b text);

create function f()
  returns table(z text)
  language plpgsql
as $body$
declare
  lhs  t[] not null := array[('??', '??'), ('??', '??'), ('??', '??')];
  rhs constant t[] not null := array[('a1', 'b1'), ('a2', 'b2'), ('a3', 'b3')];

  rt   not null :=   ('??', '??');
begin
  for j in 1..3 loop
z := rhs[j].a||', '||rhs[j].b;return next;
  end loop;

  -- Workaround
  for j in 1..3 loop
r := rhs[j];
r.a := rhs[j].a;
r.b := rhs[j].b;
lhs[j] := r;
  end loop;

  -- Ideal approach.
  for j in 1..3 loop
lhs[j].a := rhs[j].a;
lhs[j].b := rhs[j].b;
  end loop;

  z := '';return next;
  for j in 1..3 loop
z := lhs[j].a||', '||lhs[j].b;return next;
  end loop;
end;
$body$;

select z from f();

Re: PL/pgSQL: « arr[j].a := v » works fine in PG Version 14.4, fails to compile in Version 11.2. Which version brought the fix?

2022-07-25 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> I copied my testcase at the end. It runs OK and produces the output that I 
>> expect using PG Version 14.4. But using Version 11.9 (and earlier 11 
>> sub-versions), it fails to compile with this error:
>> 
>> syntax error at or near "."
>> lhs[j].a := rhs[j].a;
>>   ^
>> 
>> If I comment out the "Ideal approach" loop, then it runs fine in PG 11 and 
>> produces the same output as the "Ideal approach" does in PG 14,
>> 
>> I looked at the Release Notes for each of Versions 14, 13, and 12:
> 
> 
> From the change to this page I'd say v14:
> 
> https://www.postgresql.org/docs/14/plpgsql-statements.html
> 
> The release note for v14 say:
> 
> «
> PL/PgSQL: 
> Improve PL/pgSQL's expression and assignment parsing (Tom Lane)
> This change allows assignment to array slices and nested record fields.
> »

Thanks for the quick reply, David.

I looked at "43.5.1. Assignment" 
(https://www.postgresql.org/docs/14/plpgsql-statements.html#PLPGSQL-STATEMENTS-ASSIGNMENT).

I searched in the page for « ]. » and found this:

complex_array[n].realpart = 12.3;

The corresponding page for Version 13 has no hits for « ]. ». So that seems to 
be a clear answer to my question.

« The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree... »

2022-07-28 Thread Bryn Llewellyn

The subject line is copied from "PL/pgSQL under the Hood" 
(https://www.postgresql.org/docs/current/plpgsql-implementation.html). It 
implies the question:

« What does the term "parse" mean? »

I couldn't find more than what I quoted. Have I missed something?

Anyway, I tried some tests. Here's an example that aims to make a reasonable 
compromise between brevity and its capacity to illustrate. First, I create a 
domain and then leave it unchanged:

create domain tt as text[];

Then I do this:

create or replace function f()
  returns text
  language plpgsql
as $body$
declare
  n int;
  arr tt := array['dog', 'cat'];
begin
  n := (select count(*) from (select unnest(art)) as a);
  return n::text;
end;
$body$;

\sf+ f()
select f();

The "create or replace" completes without error and the "select" runs to 
produce the result, 2, that I expect.

If I simulate a typo by changing "n" on the LHS of the assignments to "m", then 
I get this error at "create or replace" time:

"m" is not a known variable

Moreover, "\sf+" shows that the former definition has remained intact—as I've 
come to expect.

If I fix the "n" typo and simulate a second typo by changing "tt" in the 
declaration of "arr" to "tz", then I get this error at "create or replace" time:

type "tz" does not exist

If I fix the "tz" typo and simulate a third typo by changing "arr" in the 
scalar subquery expression to "art", then "create or replace" completes without 
error and "\sf+" confirms that the new source is in place. Then, at "select" 
time, I get this error:

column "art" does not exist

So far, I'm tempted to think that "parse" covers everything about "regular" 
(i.e. not embedded SQL) PL/pgSQL statements, including syntactic analysis *and* 
the resolution of identifiers—both within the scope of the to-be-created 
subprogram and within schema scopes. 

But, as it seems, embedded SQL statements receive only syntactic 
analysis—leaving the resolution of identifiers (even when this can be done in 
the scope of the to-be-created subprogram) to runtime. (I tried changing "from" 
to "frim" and that caused a syntax error.)

Then I dropped "f()" and extended the test, thus:

create or replace function f()
  returns table(z text)
  language plpgsql
as $body$
declare
  v_sqlstate text not null := '';
  v_message  text not null := '';
  n int;
  arr tt := array['dog', 'cat'];
begin
  z := (select count(*) from (select unnest(arr)) as a)::text; return next;
exception when others then
  get stacked diagnostics
  v_sqlstate = returned_sqlstate,
  v_message  = message_text;

  z := ''; return next;
  z := v_sqlstate; return next;
  z := v_message;  return next;
end;
$body$;

\sf+ f()
select f();

"create or replace" succeeds and "select" reports what I expect: 2. Now if I 
change "arr" to "art", I get the error report from my "others" handler that I 
expect:

 42703
 column "art" does not exist

If I fix "art" back to "arr" and change "v_message" in "z := v_message;  return 
next;" to "q_message", then "create or replace" succeeds—very much to my 
surprise. Moreover; "select" succeeds too—presumably because the point of 
execution never enters the "others" handler. Only if (with the "q_message"" 
typo still in place) I change "arr" to "art" again, do I get this error on 
"select":

column "q_message" does not exist

Is this expected? In other words, is there a careful explanation of what 
"parse" means in the context of "create or replace" for a subprogram that 
predicts all of the outcomes that I reported here?

Or might my final observation be considered to be a bug—and if so, might it be 
fixed?

Re: « The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree... »

2022-07-28 Thread Bryn Llewellyn

> x...@thebuild.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Is this expected?
> 
> Yes.  This isn't a bug… Database objects (such as tables and columns) are 
> left as identifiers until they are executed, because that is the point at 
> which a plan for those statements is created.  The other components of 
> PL/pgSQL are translated to internal form (and thus checked for existence) at 
> compile time.

My example was carefully contrived to test what you said—which is what I had 
earlier understood. My deliberate typo thus:

>> change "v_message" in "z := v_message;  return next;" to "q_message"

(surely) has nothing to do with possible database objects. The context is a 
straight PL/pgSQL assignment statement (with no scalar subquery in sight).

It's this that surprises me. And it's this, and only this, that I'm asking 
about: might _just_ this be a fixable bug?

Re: « The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree... »

2022-07-28 Thread Bryn Llewellyn

> t...@sss.pgh.pa.us wrote:
> 
>> x...@thebuild.com wrote:
>> 
>> This isn't a bug.
> 
> It's actually a feature…
> 
> Having said that, there are certainly aspects of what happens when in plpgsql 
> that don't have a lot of justification other than being implementation 
> artifacts…

Thanks, Tom. I'll take your « aspects of… plpgsql [are simply] implementation 
artifacts » to mean that my hope to understand what is checked at "create or 
replace " time and what is checked first at runtime is futile.

There does seem to be a general rule. But, as my example shows, there are 
exceptions to the rule. And it's impossible to make a simple user-facing 
statement of what determines "exceptional" status.

I suppose that the conclusion is clear: you can't be sure that a subprogram is 
good until every single code path (in the basic block coverage sense of this) 
has been tested. But, anyway, it was ever thus. (Error-free compilation never 
did guarantee error-free runtime outcomes.)

I'll call this "case closed" then.

Re: « The PL/pgSQL interpreter parses the function's source text and produces an internal binary instruction tree... »

2022-07-29 Thread Bryn Llewellyn

> x...@thebuild.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> It's this that surprises me. And it's this, and only this, that I'm asking 
>> about: might _just_ this be a fixable bug?
> 
> It might be surprising, but it's not a bug. You can demonstrate it with a 
> very small test case… ["create" succeeds but it]gets an error on execution… 
> Assignment in PL/pgSQL is essentially syntactic sugar around a SELECT ... 
> INTO. Note, however, that this does *not* compile… The reasons, such as they 
> are, are based in how PL/pgSQL processes SQL statements.
> 
> "SELECT x INTO y;"
> turns it into "SELECT x   ;"
> 
> This has the virtue that… but it does result in some of the implementation 
> poking through.

Thanks for those two maximally terse examples, Christophe. They illustrate the 
same point that my larger examples aimed at. (Forgive me for not working more 
to distill mine down to what you showed.)

> Unquestionably, this is surprising!

Well, surprise is in the eye of the beholder. I was surprised at first because 
I hadn't joined the dots from:

« how PL/pgSQL evaluates expressions »

to

«
the moments at which the different flavors of "identifier could not be 
resolved" error surface:

% is not a known variable

or

column % does not exist
»

But now I've changed the way that I see this—thanks to your replies and to 
Tom's. See my reply to Tom here:

https://www.postgresql.org/message-id/4DC7ED5B-BADD-4330-B481-76490D3B319E%40yugabyte.com

I'll now adopt a very simple model for when "identifier could not be resolved" 
errors surface:

« Some surface at "create or replace" time. But many don't surface until 
runtime. It doesn't help me to look for a reliable specific predictive model 
here. »

This is what matters:

— The fact that the semantics of (embedded) SQL and expression evaluation are 
down to a single implementation, and are therefore identical in both top-level 
SQL and in PL/pgSQL, are enormous. (This stands in stark contrast to Oracle's 
PL/SQL where there are two implementations that bring inevitable divergences in 
semantics are limitations.)

— Self-evidently, runtime testing is all that ultimately matters. The more of 
this I do, and the sooner I do it, the better will be my outcomes.

— The practical advantages of later semantic checking that you've both pointed 
out are huge. For example, create a temporary table and use it *in the same 
block statement*.

And now (for the second time) "case closed".

Surprisingly forgiving behavior when a case expression is terminated with "end case"

2022-08-10 Thread Bryn Llewellyn

The account of the CASE expression here:

https://www.postgresql.org/docs/current/functions-conditional.html#FUNCTIONS-CASE

says that it's terminated with the keyword END (just as I've always 
understood)—i.e. not with the PL/pgSQL CASE statement's  END CASE.

Moreover CASE is a reserved word—as a "create table case(…)" attempt shows. Yet 
CASE is tolerated (using PG 14.4) here:

select 1 as case;

In fact, any reserved word that I try (like IF, THEN, and so on) is accepted as 
an alias. This seems to me to be wrong. What do you (all) think?

This outcome inspired this test:

create table t(k serial primary key, v text);
insert into t(v) values ('a'), (null);
select
  k,
  case
when v is null then ''
elsev
  end case
from t order by k;

I suppose that this is showing nothing more than what I already did. Everything 
that I've shown so far behaves the same if PG 11.

So then I tried the "typo" in a PL/pgSQL subprogram:

create function f(arr in text[])
  returns text
  language plpgsql
as $body$
declare
  a text;
  r text := '';
begin
  foreach a in array arr loop
a := case
   when a is null then ''
   elsea
 end case;
r := r||a||', ';
  end loop;
  return r;
end;
$body$;

select f(array['a', null::text, 'b']);

The "create function" succeeds. And the "select f()" executes without error to 
produce the result that I expect. In PG 14.4.

But in PG 11.9, the "create function" causes this error:

ERROR:  syntax error at or near "case"
LINE 13:  end case;

It seems, then, that at some version boundary between PG 11 and PG 14, 
forgiveness was introduced in this secnario, too.

Was this change to forgive what seems to be to be a straight syntax error 
deliberate? After all, you (all) thought it to be a syntax error in some 
scenarios in PG 11—but just not so in all scenarios.

Was it that the original sin of forgiveness in some scenarios could not be 
corrected because of the stronger requirement not to break existing code? And 
so this led to a "bug" fix to forgive that sin more uniformly? If so, then I 
suppose that you might say something in the doc. But there is still a wrinkle. 
This:

select
  k,
  case
when v is null then ''
elsev
  end dog
from t order by k;

runs without error. But this (in PG 14.4)

select
  k,
  case
when v is null then ''
elsev
  end case dog
from t order by k;

still fails with a syntax error:

ERROR:  syntax error at or near "dog"
LINE 6:   end case dog

So even in "current", the "end case" type isn't forgiven in all scenarios.

p.s. You can guess that I stumbled on this in the context of a fairly large 
demo app where just one of the subprograms had the "end case" typo that I 
showed above. Nothing prompted me to spot my mistake until I tested my code 
using YugabyteDB. I'm embarrassed to say that our current version still uses 
the PG 11 SQL processing code. But a soon-to-be-published new YB version will 
use PG 13. And "soon" after that, we hope to remain current with the current PG.

Re: Surprisingly forgiving behavior when a case expression is terminated with "end case"

2022-08-10 Thread Bryn Llewellyn

t...@sss.pgh.pa.us wrote:

> I think we've spent a great deal of blood, sweat, and tears making that so, 
> or as nearly so as we could. We will in fact take any keyword after "AS", and 
> in recent versions many non-reserved keywords will work that way without "AS".
> 
> (Mind you, I think the SQL spec made a serious design error in allowing "AS" 
> to be optional. But we have to live with that as best we can.)

Also, adrian.kla...@aklaver.com wrote:

> Even reserved key words are not completely reserved in PostgreSQL, but can be 
> used as column labels (for example, SELECT 55 AS CHECK, even though CHECK is 
> a reserved key word):
> https://www.postgresql.org/docs/current/sql-keywords-appendix.html

Thank you both. I never would have guessed that a word with "reserved" status 
could be used as a column alias (with or without preceding it with AS). "not 
completely reserved in PostgreSQL" makes this sound like a PG special.

So I was caught out yet again. And I fear that I'll continue to be caught out 
with other things—and maybe this one too, at some later date, when I've 
forgotten the present exchanges…

Anyway, I believe that I have the answer to my question. And my new mental 
model allowed me to predict that, as presented, this would work:

create function f()
  returns text
  language plpgsql
as $body$
declare
  a constant int := 3;
  b constant int := 5;
  c constant int := 7;
begin
  return a + b case /* + c */;
end;
$body$;

select f();

It does! It predicted, too, that when "+ c" is uncommented, "create function" 
would fail with a syntax error. And that prediction also held out.

Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-10 Thread Bryn Llewellyn

My code example ended up quite big—so I'll show it to you all only if you ask. 
But it's easy to describe. My script does this:

1. It creates three non-superuser roles: "data", "code", and "client".

2. It creates a text-book masters-and-details table pair with owner "data".

Each table has the obvious PK column. The "details" table has the obvious FK 
column. And each table has a payload column "v". The FK constraint is defined 
thus:

  constraint details_fk foreign key(mk)
references masters(mk)
match full
on delete cascade
on update restrict
initially deferred

3. It creates a DELETE trigger at each timing point on each table.

That's eight in all: [before|after] * [row|statement] * [masters|details].

The trigger functions have the same names as the triggers. And each name pair 
reflects the three degrees of freedom—for example "before_statement_masters".

Each trigger function simply does a "raise info" to report its name, the return 
value from "current_role",  (and, in the "row" case, the value of "old.v"). And 
then it does "return null".

The trigger functions are owned by "data" and are explicitly marked "security 
invoker". (The results are the same without this marking—as expected.)

(In my real use case, the trigger functions are marked "security definer". But 
I did this test in order to understand the rules.)

"data" grants appropriate privileges to "code" to let its functions succeed.

4. It creates three "security definer" procedures with owner "code"

"cr_m_and_ds()" inserts specified "details" rows and then their specified 
"masters" row. (That's why the FK constraint is "initially deferred". The 
use-case from which this test is derived needs this.)

"del_m()" deletes a specified "masters" row—which implies the cascade-delete of 
its details.

"del_ds()" deletes all the "details" rows for a specified master.

"code" grants "execute" on these to "client". ("client " owns no objects.)

5. "client" invokes "code.cr_m_and_ds()".

It's called to insert a single "masters" row with "masters.v" set to 'Mary' and 
a single "details" row with "details.v" set to 'shampoo'.

6. "client" invokes "code.del_m()" on 'Mary'.

Here's what I see:

before_statement_masters: [code]
before_row_masters: [code] Mary
before_statement_details: [data]
before_row_details: [data] shampoo
after_row_masters: [code] Mary
after_statement_masters: [code]
after_row_details: [code] shampoo
after_statement_details: [code]

(I stripped the noisy "psql:0.sql:32: INFO:" preamble for each output line by 
hand.)

I was surprised that the value from "current_role" is *not* the table owner, 
"data", in all cases. (This is how triggers behave in Oracle database.) Rather, 
it's mainly (but not always) "code". I could be persuaded that, in the 
cascade-delete case, the invoking role is the owner of the "masters" table 
rather than the role, "code" that performs the "delete" from "masters"—but that 
would maybe be a stretch. Anyway, if this is the intention, why is it like this 
only for the "before" timing points for the triggers on "details"?

7. Starting with the same 'Mary'-'shampoo' pair, client invokes "code.del_ds()" 
on 'Mary'

With the same set-up, and using this instead of "del_m()", this is the result:

before_statement_details: [code]
before_row_details: [code] shampoo
after_row_details: [code] shampoo
after_statement_details: [code]

Here the value for "current_role" from each trigger is the same. These results 
are in line with the common case in the first test.

I read the section "Triggers on Data Changes" 
(https://www.postgresql.org/docs/current/plpgsql-trigger.html#PLPGSQL-DML-TRIGGER
 
).
 But there's no hit on the page for any of "security", "invoker", or "definer". 
And I couldn't find wording without these terms that addresses what I describe 
here.

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-10 Thread Bryn Llewellyn

david.g.johns...@gmail.com wrote:

> …you should spend some time making a smaller code example that still shows 
> the desired behavior but can be easily read and executed by others. In 
> particular, your description of simply returning NULL for all triggers seems 
> suspect. If only two of the eight triggers show the problem then the example 
> only needs two triggers to show the presence of the unexpected current_role 
> and to get clarity why it is that. All the stuff that is working as expected 
> is just noise; that is the stuff that can be summed up with words on a first 
> pass.
>  
> As the behavior you are pointing out has nothing to do with pl/pgsql 
> specifically, but rather the runtime environment of triggers in the server, 
> it is not surprising the lack of discussion of this topic in that part of the 
> documentation.

You said "simply returning NULL for all triggers seems suspect.". Yes, it would 
be! I took my approach from "The usual idiom in DELETE triggers is to return 
OLD." in the "Overview of Trigger Behavior" section. And, indeed, I do say 
"return old" in the code of each of my eight triggers. I'm embarrassed to say 
that I simply did a typo in my email account.

I'll be happy to make a smaller example. It will, however, need to create 
users, a database, schemas, and some number of triggers. Because the triggers, 
their functions, and everything else about them follow a pattern, I can use 
"format()" and dynamic SQL to generate them. I'll still need those three 
"security definer" procedures to make the table changes that I explained. And 
the code to call these procedures to implement the test. So the result won't be 
exactly small. But, while I'm generating the triggers, I may just as well 
generate all eight. After all, how would I know which of the eight to skip 
while I don't know the intended rules for the current_role?

Am I missing something about how this list is meant to work? Email attachments 
don't make it to the archive for posts to this list 
(https://www.postgresql.org/list/pgsql-general/ 
). Is there a reliable place 
where I can post a code .zip so that readers of the list can download it? I 
asked ages ago about a GitHub-style scheme for filing and tracking PG issues. 
but I was told that this email list, and its cousins, is the system that you 
(all) prefer.

Meanwhile, I'd hoped that what I said would prompt a simple statement of what 
rules are intended. You implied that it's elsewhere in the doc than what I 
thought would be the place to look. A cross-reference to that section, from the 
"Triggers on Data Changes" section (and other sections like "CREATE TRIGGER") 
would help.

Could you please point me to where the statement of rules that I'm seeking is 
made? Or, failing that, simply tell me what the intended rules are?

Should I read your "All the stuff that is working as expected is just noise; 
that is the stuff that can be summed up with words on a first pass." to mean 
that the intended role is that the current_role in a "security invoker" trigger 
function is the role that does the trigger-firing DML on the table?

B.t.w., here's what I had tried before sending the email that started this 
thread. (I didn't want to bore you all with this long story.)

I looked in the "CREATE TRIGGER" section. I found this:

https://www.postgresql.org/docs/current/sql-createtrigger.html#SQL-CREATETRIGGER-NOTES
« To create or replace a trigger on a table, the user must have the TRIGGER 
privilege on the table. The user must also have EXECUTE privilege on the 
trigger function. »

I suppose that this covers the case where a pre-existing trigger function has a 
different owner than will the to-be-created trigger that wants to use it.

This rule is certainly in the general space where the answer to my question 
might be found. But this doc extract doesn't answer it explicitly. However, it 
does seem to imply that the invoking role for a trigger function will be the 
owner of the trigger that uses it. And in my example, this was the table owner. 
(In other words, the Oracle Database rule.) But this is not what I see in six 
out of the eight of my tests.

There are no hits on the "CREATE TRIGGER" page for "current_role" or 
"current_user" (with or without the underscore). Neither are there any hits if 
I say "effective user" or "effective role". I don't know what else to try.

I looked in  the "CREATE FUNCTION" section. It does, of course, explain the 
"security invoker" and "security definer" notions. But doesn't say anything 
there to inform my question. (There are only two hits for the word "trigger" on 
the page. And neither is relevant for my question.)

I tried "Overview of Trigger Behavior" 
(https://www.postgresql.org/docs/current/trigger-definition.html 
). But again 
found nothing.

Of course, I tried Google, and the PG doc's own search, for all the w

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-11 Thread Bryn Llewellyn

> karsten.hilb...@gmx.net wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> I'll be happy to make a smaller example. It will, however, need to create… 
>> After all, how would I know which of the eight to skip while I don't know 
>> the intended rules for the current_role?
> 
> You'd certainly start out with all eight but then whittle down to what still 
> exhibits the problem and post that.

Do you know where I can read a statement of the intended rules here? I 
appreciate that one is doomed who tries to deduce the rules that govern a 
software system's behavior by using just empirical testing. (And reading source 
code hoping to deduce the behavior that the programmer intended is hardly 
better.)

I used the subject "surprising results" to mean "Results that surprise me, 
Bryn". The results might well not surprise somebody who knows the rules. 
Several cases that I've asked about before on this list were surprising for me 
because I was too dim-witted to find where, in the PG docs, the rules were 
stated. And in those cases, I was delighted to be pointed to the appropriate 
doc and to receive some helpful instruction. That's what I'm hoping for here.

Notice that I didn't consider "for insert" or "for update" triggers. But you 
can contrive a cascade effect with these, too. For example, table "t1" might 
have a trigger that inserts or updates a row in table "t2" for a purpose like 
maintaining a change history. And "t2" might, in turn, have a trigger for 
who-knows-what purpose (maybe to enforce a write-once-read-many regime for the 
values in certain columns).

This is why I'd very much like to start by studying a clear statement of the 
intention in scenarios in the same general class as the one that I showed.

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-11 Thread Bryn Llewellyn

> david.g.johns...@gmail.com wrote:
> 
> You are correct that the behavior here is not explicitly documented [where] I 
> would expect to find it.
> 
> My expectation is that the trigger owner is the context in which the trigger 
> function is executed. Table owners can grant the ability to other roles to 
> create triggers on the owned tables should the trigger owner and table owner 
> require different roles. Triggers are like views in this regard.
> 
> I don't expect cascade update/cascade delete to be a factor here at all, 
> other than making the process easier to perform. This extends from the prior 
> expectation.
> 
> I expect [all this] not because I can point to a place where all that is said.
> 
> I would for sure expect deviations to be mentioned, and would find explicit 
> documentation to be reasonable to add if someone pushes forward such a change.

Good. We're converging. Thanks, David. I think that this  is a fair summary:

1. The PG doc very likely has no clear statement, anywhere, of the rules that 
govern the behavior in the class of trigger scenarios under discussion.

2. An expectation of what the rules are has emerged:

> the invoking role for a trigger's function is the role[*] that owns the 
> trigger.

Thereafter, the privilege domain in which the function executes is governed by 
the ordinary, separable, rules about "security definer" versus "security 
invoker".

I mentioned this from the "CREATE TRIGGER" section:

> https://www.postgresql.org/docs/current/sql-createtrigger.html#SQL-CREATETRIGGER-NOTES
>  
> 
> To create or replace a trigger on a table, the user must have the TRIGGER 
> privilege on the table. The user must also have EXECUTE privilege on the 
> trigger function.

It gives a strong hint that David's expectations are sound.

3. The outcomes with the eight triggers that I tested show straight buggy 
behavior in six cases. Moreover, because David said "don't expect cascade… to 
[matter]", the outcomes in the other two cases might show correct behavior only 
by accident.

It seems to me, therefore, that a carefully constructed, "single click", 
reproducible testcase is needed. I have this on my laptop. But, of course, I 
need to refine it a bit and review it thoroughly. It ended up in several .sql 
scripts called by a master script. This naturally implies a .zip file as the 
delivery vehicle.

Nobody has told me how an outsider like me can deliver such a .zip file, 
together with its typographically nuanced external documentation, to readers of 
plsql-general. So this is what I'll do:

I'll create a placeholder GitHub issue in "yugabyte/yugabyte-db" and send you 
the URL. Anybody can access this repo, read the account of the issues, and 
download an attached .zip of a testcase. I'll mention in my account that the 
behavior that I observe in YugabyteDB reproduces exactly in PG 14.4, that the 
YugabyteDB issue is filed for tracking purposes, and that I'll update the 
account with more information from the PG folks in due course.

Please bear with me. It might be a few days before I'm able to send you all the 
promised URL.

[*] The PG doc favors using "role" in sentences like this rather than 
"user"—even though a role might have child rows in an indefinitely deep 
hierarchy. I trust that this point is fully separable from what matters in the 
present triggers scenario.

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-11 Thread Bryn Llewellyn

> rjuju...@gmail.com wrote:
> 
>> b...@yugabyte.com wrote:
>> 
>> Nobody has told me how an outsider like me can deliver such a .zip file, 
>> together with its typographically nuanced external documentation…
> 
> You mentioned previously that "Email attachments don't make it to the archive 
> for posts to this list", but they should. It seems that you're using apple 
> mail, which is famous for having such problems, see [1] for instance.
> 
> Using a different MUA, or configuring apple mail to correctly put attachment 
> as attachment will solve this problem.
> 
> [1]: 
> https://www.postgresql.org/message-id/CABUevEwEw35g7n3peoqmpWraQuRvounck7puDUWU-S-%3DyfsoEA%40mail.gmail.com

Hmm… I use a modern MacBook with the always current macOS Big Sur. (One day 
I'll pluck up courage and get to Monterey.) I use the native "Mail.app" email 
client (a.k.a. Mail User Agent) at whatever version comes with the native macOS 
upgrade process. I use this to send attachments all the time to no end of 
friends and colleagues—without issue. So how can there be anything wrong with 
how my "Mail.app" is configured? Is the configuration a pairwise notion so that 
I need dedicated settings to send to the pgsql-general list?

Yugabyte uses an email service from Google. I suppose that I could use their 
ordinary browser-based interface to send emails to the pgsql-general list. It 
would certainly be good to fix this for future exchanges. But for now, I'll 
stick to my plan. This will make it easy for me to draw this issue to the 
attention of colleagues and to give me a place where I add updates about 
progress on the issue.

Moreover, GitHub allows Markdown formatting. And the ability to format even a 
shortish essay with ordinary modern devices like heading levels, bullet lists, 
italics, and especially code blocks makes an enormous difference to readability.

The conventions that this list's archive imposes (only plain text, quoted 
content indicated with successively deep chevron-style marks, explicit URLs 
twice as long as your arm, and baked-in hard line breaks at about a dozen 
words) makes comprehension quite hard—and structuring an account well-nigh 
impossible.

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-11 Thread Bryn Llewellyn

david.g.johns...@gmail.com wrote:My code example ended up quite bigâso I'll show it to you all only if you 
ask. But it's easy to describe. My script does this:

1. It creates three non-superuser roles: "data", "code", and "client".

2. It creates a text-book masters-and-details table pair with owner "data".

Each table has the obvious PK column. The "details" table has the obvious FK 
column. And each table has a payload column "v". The FK constraint is defined 
thus:

  constraint details_fk foreign key(mk)
references masters(mk)
match full
on delete cascade
on update restrict
initially deferred

3. It creates a DELETE trigger at each timing point on each table.

That's eight in all: [before|after] * [row|statement] * [masters|details].

The trigger functions have the same names as the triggers. And each name pair 
reflects the three degrees of freedomâfor example "before_statement_masters".

Each trigger function simply does a "raise info" to report its name, the return 
value from "current_role",  (and, in the "row" case, the value of "old.v"). And 
then it does "return null".

The trigger functions are owned by "data" and are explicitly marked "security 
invoker". (The results are the same without this markingâas expected.)

(In my real use case, the trigger functions are marked "security definer". But 
I did this test in order to understand the rules.)

"data" grants appropriate privileges to "code" to let its functions succeed.

4. It creates three "security definer" procedures with owner "code"

"cr_m_and_ds()" inserts specified "details" rows and then their specified 
"masters" row. (That's why the FK constraint is "initially deferred". The 
use-case from which this test is derived needs this.)

"del_m()" deletes a specified "masters" rowâwhich implies the cascade-delete 
of its details.

"del_ds()" deletes all the "details" rows for a specified master.

"code" grants "execute" on these to "client". ("client " owns no objects.)

5. "client" invokes "code.cr_m_and_ds()".

It's called to insert a single "masters" row with "masters.v" set to 'Mary' and 
a single "details" row with "details.v" set to 'shampoo'.

6. "client" invokes "code.del_m()" on 'Mary'.

Here's what I see:

before_statement_masters: [code]
before_row_masters: [code] Mary
before_statement_details: [data]
before_row_details: [data] shampoo
after_row_masters: [code] Mary
after_statement_masters: [code]
after_row_details: [code] shampoo
after_statement_details: [code]

(I stripped the noisy "psql:0.sql:32: INFO:" preamble for each output line by 
hand.)

I was surprised that the value from "current_role" is *not* the table owner, 
"data", in all cases. (This is how triggers behave in Oracle database.) Rather, 
it's mainly (but not always) "code". I could be persuaded that, in the 
cascade-delete case, the invoking role is the owner of the "masters" table 
rather than the role, "code" that performs the "delete" from "masters"âbut 
that would maybe be a stretch. Anyway, if this is the intention, why is it like 
this only for the "before" timing points for the triggers on "details"?

7. Starting with the same 'Mary'-'shampoo' pair, client invokes "code.del_ds()" 
on 'Mary'

With the same set-up, and using this instead of "del_m()", this is the result:

before_statement_details: [code]
before_row_details: [code] shampoo
after_row_details: [code] shampoo
after_statement_details: [code]

Here the value for "current_role" from each trigger is the same. These results 
are in line with the common case in the first test.

I read the section "Triggers on Data Changes" 
(https://www.google.com/url?q=https://www.postgresql.org/docs/current/plpgsql-trigger.html%23PLPGSQL-DML-TRIGGER&source=gmail-imap&ust=166088155600&usg=AOvVaw2IjBK9LWeL4u9d1epMjuMk).
 But there's no hit on the page for any of "security", "invoker", or "definer". 
And I couldn't find wording without these terms that addresses what I describe 
here.
<>
Good. I can see the attachments here too:https://www.postgresql.org/message-id/CAKFQuwaNcC2NPtwNY%2BNvbbHFQuphfkicvLxcnCbnwTTwKf%3DGdw%40mail.gmail.comAnd I can download via the links with no problem. I'll aim to work out what's wrong my end as soon as I can.

Re: Can I get the number of results plus the results with a single query?

2022-08-15 Thread Bryn Llewellyn

>>> p...@easesoftware.com  wrote:
>>> 
>>> Currently I’m doing this with two queries such as:
>>> 
>>> SELECT COUNT(*) FROM table WHERE …. expression …
>>> SELECT * FROM table WHERE …. expression …
>>> 
>>> But this requires two queries. Is there a way to do the same thing with 
>>> just one quer?
>> 
>> david.g.johns...@gmail.com wrote:
>> 
>> Use count as a window function.
> 
> hjp-pg...@hjp.at wrote:
> 
> I don't think there can be [a way to do the same thing with just one query].

How about this:

create table t(k serial primary key, v int not null);
insert into t(v) values (7), (19), (42), (57), (100), (200), (300);

create function f()
  returns table(z text)
  language plpgsql
  stable
as $body$
declare
  r int not null := 0;
  results constant int[] :=
(
  select array_agg(v order by v) from t where v < 100
);
begin
  z := 'Count(*): '||cardinality(results);return next;

  foreach r in array results loop
z := r::int;  return next;
  end loop;
end;
$body$;

select f();

It suns without error and gets this result:

 Count(*): 4
 7
 19
 42
 57

Is this what you want, Perry? B.t.w., your second "count(*)" might give a 
misleading answer unless you use "repeatable read" or "serializable"—and it's 
generally a good plan to avoid those isolation levels unless your use case 
forces no other choice.

I've never heard anybody say that this "bulk collect" approach (using Oracle 
Database's vocabulary) harms performance. Does anybody think that it might?

Re: Can I get the number of results plus the results with a single query?

2022-08-16 Thread Bryn Llewellyn

> hjp-pg...@hjp.at wrote:
> 
> That's not quite what I meant. I meant "I don't think there can be what you 
> want with just one query",
> 
> The OP wants some kind of progress indicator. To be useful, such an indicator 
> should be approximately linear in time. I.e. if your query returns 1 rows 
> in 5 minutes (yes, that's slow, but you don't need a progress bar for fast 
> queries), it should display "0/1" after 0 seconds, "33/1" after 1 
> second, "2000/1" after 1 minute, etc. That ideal is of course 
> unrealistic, it's quite ok if it displays "0/unknown" fpr a few seconds and 
> speeds up and slows down during execution. But if it displays "0/unknown" for 
> 4 minutes and 55 seconds and then counts up to 1 during the last 5 
> seconds, the progress
> indicator is useless.
> 
> You are stuffing the whole result into an array and THEN counting the number 
> of elements. So when you get to the count all of the work (except sending the 
> result to the client) is already done, so there is little point in displaying 
> a progress indicator.

I see, Peter. You’d read the OP’s mind. But I’d failed to. I saw the subject, 
you I assumed that the OP wanted the entire result set together with the count 
of the results. (After all, there’s no inflexions of “page” in the OP’s 
question.)

It seems, then, that the requirement is to page through results like a Google 
search for « SQL injection » does. The first page shows this

« About 29,400,000 results (0.45 seconds) »

And then it lets you choose other pages to see. That result of about 30 million 
is famously approximate. After all, nobody, would able be to disprove its 
correctness. And paging through the results from a SQL query in a stateless 
browser client has its own (in)famous characteristics.

It sounds like the OP wants a fast approximate count for a query whose 
restriction isn’t known until runtime. (His example uses just a single table). 
The planner makes estimates like this. Might something along these lines 
provide what’s being sought:

create table t(k int primary key);
insert into t(k) select generate_series(1, 100);
explain (costs true) select * from t where k between 2 and 20050;

This is what it produced for me:

 Index Only Scan using t_pkey on t  (cost=0.42..5.39 rows=48 width=4)
   Index Cond: ((k >= 2) AND (k <= 20050))

Seems to me that "rows=48” is a perfectly reasonable estimate…

Re: Surprising results from current_role in a "security invoker" trigger function in a "cascade delete via FK" scenario

2022-08-23 Thread Bryn Llewellyn

> b...@yugabyte.com wrote:
> 
>> david.g.johns...@gmail.com wrote:
>> 
>> You are correct that the behavior here is not explicitly documented [where] 
>> I would expect to find it.
>> 
>> My expectation is that the trigger owner is the context in which the trigger 
>> function is executed. Table owners can grant the ability to other roles to 
>> create triggers on the owned tables should the trigger owner and table owner 
>> require different roles. Triggers are like views in this regard.
>> 
>> I don't expect cascade update/cascade delete to be a factor here at all, 
>> other than making the process easier to perform. This extends from the prior 
>> expectation.
>> 
>> I expect [all this] not because I can point to a place where all that is 
>> said.
> 
> Good. We're converging. Thanks, David… So this is what I'll do:
> 
> I'll create a placeholder GitHub issue in "yugabyte/yugabyte-db" and send you 
> the URL. Anybody can access this repo, read the account of the issues, and 
> download an attached .zip of a testcase. I'll mention in my account that the 
> behavior that I observe in YugabyteDB reproduces exactly in PG 14.4, that the 
> YugabyteDB issue is filed for tracking purposes, and that I'll update the 
> account with more information from the PG folks in due course.

I've submitted the promised GitHub issue. It has an attached .zip of a 
self-contained, one-touch testcase. Here it is:

https://github.com/yugabyte/yugabyte-db/issues/13736 

Everything points to a bug in that, no matter what the intended behavior is, 
the same rules should apply in all scenarios—but they don't. Further, I agree 
with David's:

>> I don't expect cascade update/cascade delete to be a factor here at all.

My testcase tries a gazillion variations (including what happens with a 
function-based constraint and when DML to one table "cascades" to another 
because of trigger action). After all, in the presence of one bug where 
something unexpected happens in one scenario, who knows what unexpected 
outcomes might be seen in other comparable scenarios?

Here's my summary of my findings, copied from the issues report:

«
It very much does seems that the intended rule is this:

- The value that current_role will report in a security invoker trigger 
function for a DML trigger is the role that does the DML to the table that 
fires the trigger.

It seems, too, that this testcase has revealed a long-standing bug—present at 
least as far back as PostgreSQL Version 11:

- The value that current_role will report in a security invoker trigger 
function for a DML trigger on a "details" table will be the owner of that table 
when its rows are cascade-deleted as a consequence of deleting its parent 
"masters" row. However, this buggy outcome is seen only for "before delete" 
triggers, both at “statement" level and at "row" level.

- The bug has an obvious downstream consequence: any operation on other tables 
that are done by such a trigger function that sees the wrong current_role will 
be executed by that wrong role—and so on down the line.
»

W.r.t. David's 

>> My expectation is that the trigger owner is the context in which the trigger 
>> function is executed.

This can't be right because a trigger doesn't have an owner. You can see this 
from the "pg_trigger" table. It has its own "oid" and a "tgrelid". But it has 
no FK to "pg_roles" or to "pg_namespace”. In other words, a trigger isn't a 
first-class schema object. Rather, from the user's P.o.V., it’s uniquely 
identified by the table to which it's attached. In this sense, it's like a 
constraint (and especially like one that's based on a function). Each of these 
two, trigger and function-based-constraint, is a device that associates a 
"call-back" function with a table so that the function is invoked in response 
to DML to the table.

The "pg_constraint" table, too, has no FK to "pg_roles". Mysteriously, though, 
it does have a "connamespace" column. Presumably this is a denormalization such 
that its value is always equal to "relnamespace" in "pg_class" for the table to 
which the constraint is attached..

It seems to me, therefore, that the role that creates the trigger is out of the 
picture once the trigger has been created. (There's no analogy here for a 
constraint because there's no "grant alter" on a table to correspond to "grant 
trigger" on a table.)

It seems, too, that the owner of the trigger function (and of the constraint 
function) is out of the picture at run-time (when these are "security invoker") 
for determining the value that "current_role" in such a function will report.

In other words, and as I see it, there are only two candidate answers: the role 
that does the DML that causes the function to be invoked; and the role that 
owns the table—DML to which causes the function to be invoked.

In most cases, current_role here shows who does the DML. But in those two rare 
corner cases that my testcase

1 2 3 4 >

1 - 100 of 303 matches

Mail list logo